Citation and policy based document classification

US2021390488A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021390488-A1
Application numberUS-202117345907-A
CountryUS
Kind codeA1
Filing dateJun 11, 2021
Priority dateJun 11, 2020
Publication dateDec 16, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for rapid identification and access to relevant regulatory documents. A data model relating regulatory mandates and requirements to citations appearing within an enforcement document is used to rapidly access specific citations within an enforcement document. In the case of image-based enforcement documents, the originality of these documents is preserved while allowing a user to see where the relevant citations appear in the document images. The relevant citations are further compared to business policies to identify potential impacts of regulatory mandates and requirements.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: extracting, by at least one processor, a citation from a document; parsing, by the at least one processor, the citation into first one or more sub-citations; comparing, by the at least one processor, the first one or more sub-citations to second one or more sub-citations of a reference citation in a policy; and classifying, by the at least one processor, the document as relevant to the policy based on the first one or more sub-citations and the second one or more sub-citations. 2 . The method of claim 1 , wherein extracting the citation from the document comprises: comparing text in the document to a citation key; and in response to identifying a first string of characters from the text that matches the citation key, extracting a second string of characters comprising the first string of characters, a third string of characters located in the text immediately before the first string and a fourth string of characters located in the text immediately after the first string. 3 . The method of claim 1 , wherein extracting the citation from the document comprises: comparing text in the document to a citation key, wherein the citation key comprises a first string of characters; and in response to identifying a second string of characters from the text that differs from the citation key by at least one character and having a matching metric greater than a threshold, marking the second string of characters as a citation extraction error. 4 . The method of claim 1 , wherein: the first one or more sub-citations comprises a first root and a first sub-section; the second one or more sub-citations comprises a second root and a second sub-section; and classifying the citation based on the first one or more sub-citations and the second one or more sub-citations comprises: in response to the first one or more sub-citations matching the second one or more sub-citations, classifying the document as an exact match for the policy; in response to the first root matching the second root and the first subsection matching the second subsection, classifying the document as a subsection match for the policy; in response to the first root matching the second root and the first subsection being different than the second subsection, classifying the document as a root match for the policy; and in response to the first root being different than the second root, classifying the document as a non-match for the policy. 5 . The method of claim 1 , further comprising: extracting, by the at least one processor, one or more fine amounts from the document, wherein the one or more fine values correspond to the citation; removing, by the at least one processor, each duplicate fine amount from the one or more fine amounts to form a set of fine amounts; and combining, by the at least one processor, the set of fine amounts into a total fine amount corresponding to the citation. 6 . The method of claim 1 , further comprising: extracting, by the at least one processor, a fine amount from the document, wherein the fine amount comprises a number prefix and a word suffix; and combining, by the at least one processor, the number prefix and the word suffix to form a number amount for the fine amount. 7 . The method of claim 1 , further comprising: determining, by the at least one processor, a score describing an impact of the document on a business based on the policy, the citation, the classification of the document, and text in the document related to the citation; and in response to the score being greater than a threshold, presenting, by the at least one processor, a summary of the impact of the document on the business, wherein the summary comprises one or more of a link to the policy, the citation, a link to a reference document described by the citation, a classification of the document, a link to the document, or the score. 8 . The method of claim 7 , further comprising: extracting, by the at least one processor, a total fine amount corresponding to the citation, wherein determining the score describing the impact of the document on the business is further based on the total fine amount; and wherein the summary further comprises the total fine amount. 9 . A system, comprising: one or more processors; memory communicatively coupled to the one or more processors, the memory storing instructions which, when executed by the one or more processors, cause the one or more processors to: extract a citation from a document; parse the citation into first one or more sub-citations; compare the first one or more sub-citations to second one or more sub-citations of a reference citation in a policy; and classify the document as relevant to the policy based on the first one or more sub-citations and the second one or more sub-citations. 10 . The system of claim 9 , wherein the instructions are further configured to extract the citation from the document by: comparing text in the document to a citation key; and in response to identifying a first string of characters from the text that matches the citation key, extracting a second string of characters comprising the first string of characters, a third string of characters located in the text immediately before the first string and a fourth string of characters located in the text immediately after the first string. 11 . The system of claim 9 , wherein the instructions are further configured to extract the citation from the document by: comparing text in the document to a citation key, wherein the citation key comprises a first string of characters; and in response to identifying a second string of characters from the text that differs from the citation key by at least one character and having a matching metric greater than a threshold, marking the second string of characters as a citation extraction error. 12 . The system of claim 9 , wherein: the first one or more sub-citations comprises a first root and a first sub-section; the second one or more sub-citations comprises a second root and a second sub-section; and the instructions are further configured to classify the citation based on the first one or more sub-citations and the second one or more sub-citations by: in response to the first one or more sub-citations matching the second one or more sub-citations, classifying the document as an exact match for the policy; in response to the first root matching the second root and the first subsection matching the second subsection, classifying the document as a subsection match for the policy; in response to the first root matching the second root and the first subsection being different than the second subsection, classifying the document as a root match for the policy; and in response to the first root being different than the second root, classifying the document as a non-match for the policy. 13 . The system of claim 9 , wherein the instructions are further configured to: extract one or more fine amounts from the document, wherein the one or more fine values correspond to the citation; remove each duplicate fine amount from the one or more fine amounts to form a set of fine amounts; and combine the set of fine amounts into a total fine amount corresponding to the citation. 14 . The system of claim 9 , wherein the instructions are further configured to: extract a fine amount from the document, wherein the fine amount comprises a number prefix and a word suffix; and combine the number prefix and the word suffix to form a number amount for the fine amount. 15 . The system of

Assignees

Inventors

Classifications

  • Document matching, e.g. of document images · CPC title

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

  • Parsing · CPC title

  • Clustering; Classification · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021390488A1 cover?
Disclosed herein are system, method, and computer program product embodiments for rapid identification and access to relevant regulatory documents. A data model relating regulatory mandates and requirements to citations appearing within an enforcement document is used to rapidly access specific citations within an enforcement document. In the case of image-based enforcement documents, the origi…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/06375. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).