Generating training data from a machine learning model to identify offensive language
US-2020125639-A1 · Apr 23, 2020 · US
US11461441B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11461441-B2 |
| Application number | US-201916401616-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 2, 2019 |
| Priority date | May 2, 2019 |
| Publication date | Oct 4, 2022 |
| Grant date | Oct 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for machine learning-based anomaly detection in a monitored location. One method comprises obtaining data from multiple data sources associated with a monitored location for storage into a data repository; processing the data to generate substantially continuous time-series data for multiple distinct features within the data; applying the substantially continuous time-series data for the distinct features to a machine learning baseline behavioral model to obtain a probability distribution representing a behavior of the monitored location over time; and evaluating a probability score generated by the machine learning baseline behavioral model to identify an anomaly at the monitored location. The machine learning baseline behavioral model is trained, for example, to identify anomalies in correlations between the plurality of distinct features at each timestamp. A presence verification is optionally provided based on a deviation from the machine learning baseline behavioral model at the monitored location.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: obtaining data from a plurality of data sources associated with a monitored physical location for storage into a data repository; processing the data to generate substantially continuous time-series data for a plurality of distinct features within the data; applying the substantially continuous time-series data for the plurality of distinct features to at least one machine learning baseline behavioral model to obtain a probability distribution representing a behavior of the monitored physical location over time, wherein the at least one machine learning baseline behavioral model is trained to learn a baseline behavior comprising one or more expected times of at least one expected occupant at the monitored physical location, wherein an unexpected occupant at the monitored physical location at a given time is identified based on a deviation of the unexpected occupant at the monitored physical location at the given time from the learned one or more expected times of the at least one expected occupant at the monitored physical location in the at least one machine learning baseline behavioral model, and wherein the probability distribution comprises a multi-dimensional probability distribution representing one or more human properties, wherein the multi-dimensional probability distribution takes into account (i) a temporal pattern behavior of each of the plurality of distinct features related to the one or more human properties and (ii) temporal correlations between feature values of at least two of the plurality of distinct features, related to the one or more human properties, at each timestamp, wherein the at least one machine learning baseline behavioral model is further trained to treat a presence of a given expected occupant at the monitored physical location at a different time than the one or more expected times for the given expected occupant as a non-anomalous event; and evaluating a probability score generated by the at least one machine learning baseline behavioral model to identify an anomaly at the monitored physical location; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 , wherein the plurality of data sources comprises one or more of sensor devices at the monitored location, physiological sensor devices for one or more humans at the monitored location, a network device associated with the monitored location and a smart appliance at the monitored location. 3. The method of claim 1 , wherein the substantially continuous time-series data for the plurality of distinct features is applied to the at least one machine learning baseline behavioral model as a data vector with a value for each distinct feature for a given timestamp. 4. The method of claim 1 , wherein the step of evaluating the probability score further comprises comparing the probability score to one or more predefined thresholds. 5. The method of claim 1 , wherein the step of evaluating the probability score further comprises the step of evaluating the probability score for each of the distinct features. 6. The method of claim 1 , wherein the processing step further comprises applying at least one function to the data to obtain a plurality of time-series counters for the plurality of distinct features within the data. 7. The method of claim 1 , wherein the applying the substantially continuous time-series data for the plurality of distinct features to the at least one machine learning baseline behavioral model comprises applying a difference, between a predicted value of a given feature by a given machine learning baseline behavioral model and a measured value of the given feature, to an aggregate model. 8. The method of claim 1 , wherein the temporal pattern behavior of each of the plurality of distinct features is used to identify at least one anomaly in one or more of a given distinct feature and a plurality of the distinct features. 9. The method of claim 1 , wherein the at least one machine learning baseline behavioral model comprises a different machine learning model for each of the plurality of distinct features within the data and an additional aggregated machine learning model that aggregates an output of each of the different machine learning models for each of the plurality of distinct features. 10. A computer program product, comprising a tangible machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining data from a plurality of data sources associated with a monitored physical location for storage into a data repository; processing the data to generate substantially continuous time-series data for a plurality of distinct features within the data; applying the substantially continuous time-series data for the plurality of distinct features to at least one machine learning baseline behavioral model to obtain a probability distribution representing a behavior of the monitored physical location over time, wherein the at least one machine learning baseline behavioral model is trained to learn a baseline behavior comprising one or more expected times of at least one expected occupant at the monitored physical location, wherein an unexpected occupant at the monitored physical location at a given time is identified based on a deviation of the unexpected occupant at the monitored physical location at the given time from the learned one or more expected times of the at least one expected occupant at the monitored physical location in the at least one machine learning baseline behavioral model, and wherein the probability distribution comprises a multi-dimensional probability distribution representing one or more human properties, wherein the multi-dimensional probability distribution takes into account (i) a temporal pattern behavior of each of the plurality of distinct features related to the one or more human properties and (ii) temporal correlations between feature values of at least two of the plurality of distinct features, related to the one or more human properties, at each timestamp, wherein the at least one machine learning baseline behavioral model is further trained to treat a presence of a given expected occupant at the monitored physical location at a different time than the one or more expected times for the given expected occupant as a non-anomalous event; and evaluating a probability score generated by the at least one machine learning baseline behavioral model to identify an anomaly at the monitored physical location. 11. The computer program product of claim 10 , wherein the substantially continuous time-series data for the plurality of distinct features is applied to the at least one machine learning baseline behavioral model as a data vector with a value for each distinct feature for a given timestamp. 12. The computer program product of claim 10 , wherein the step of evaluating the probability score further comprises the step of evaluating the probability score for each of the distinct features. 13. The computer program product of claim 10 , wherein the applying the substantially continuous time-series data for the plurality of distinct features to the at least one machine learning baseline behavioral model comprises applying a difference, between a predicted value of a given feature by a given machine learning baseline behavioral model and a measured value of the given feature, to an aggregate model. 14. The computer program product of claim 10 , whe
Classification, e.g. identification · CPC title
of input or preprocessed data · CPC title
Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title
using classification, e.g. of video objects · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.