Automatic locale determination for electronic documents

US9858258B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9858258-B1
Application numberUS-201615282350-A
CountryUS
Kind codeB1
Filing dateSep 30, 2016
Priority dateSep 30, 2016
Publication dateJan 2, 2018
Grant dateJan 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automatic locale determination for documents is described. In an embodiment, a computer server receives an electronic document comprising a plurality of unknown-language data elements each associated with one or more types. Based on a document schema of the document, the computer system selects one or more unknown-language data elements from the plurality of unknown-language data elements and assigning to each of the one or more unknown-language data elements a corresponding weight value based on a respective type of the unknown-language data element. The computer system compares the one or more unknown-language data elements with a plurality of known-language data elements that are associated with the document schema and based on the comparing, determines a number of unknown-language data elements in the one or more unknown-language data elements that matched any in a subset of the plurality of known-language data elements, wherein the subset of known-language data elements corresponds to a particular language. Based on the number of data elements that matched to the subset of known-language data elements and based on the corresponding weight assigned to each unknown-language data element in the number of unknown-language data elements, the computer system determines a language confidence level value specifying a level of machine confidence that the document is expressed in the particular language and based on the language confidence value for the particular language exceeding a language threshold value, automatically processes the document using the particular language.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing method comprising: receiving, at a server computer, an electronic document comprising a plurality of unknown-language data elements each associated with one or more types; based on a document schema of the document, selecting one or more unknown-language data elements from the plurality of unknown-language data elements; assigning to each of the one or more unknown-language data elements a corresponding weight value based on a respective type of the unknown-language data element; comparing the one or more unknown-language data elements with a plurality of known-language data elements that are associated with the document schema; based on the comparing, determining a number of unknown-language data elements in the one or more unknown-language data elements that matched any in a subset of the plurality of known-language data elements, wherein the subset of known-language data elements corresponds to a particular language; based on the number of unknown-language data elements in the one or more unknown-language data elements that matched to the subset of known-language data elements and based on the corresponding weight value assigned to each unknown-language data element in the number of unknown-language data elements, determining a language confidence level value specifying a level of machine confidence that the document is expressed in the particular language; based on the language confidence level value for the particular language exceeding a language threshold value, automatically processing the document using the particular language. 2. The method of claim 1 , further comprising: receiving the document as part of receiving a request to process the document, the request comprising one or more additional data elements; selecting an additional data element that indicates possible language for the request, the additional data element assigned to a particular weight; based on a data value of the additional data element and the particular weight, adjusting the language confidence level value for the document. 3. The method of claim 1 , wherein the respective type of the unknown-language data element is a data field name of the unknown-language data element or a data value of the unknown-language data element of the document. 4. The method of claim 1 , wherein selecting one or more unknown-language data elements from the plurality of unknown-language data elements is further based on a document type of the document. 5. The method of claim 1 , wherein the document schema of the document depends on a type of structured data included in the document, and wherein the type of the structured data is one or more of XML (Extensible Markup Language), JSON (JavaScript Object Notation), cXML (commerce eXtensible Markup Language), IDoc (Intermediate Document), CSV (Comma Separated values), or ODF (Open Document). 6. The method of claim 1 , further comprising: storing the plurality of known-language data elements associated with the document schema of the document in a data store in a plurality of language sets of known-language data elements, each set of known-language data elements corresponding to a supported language in a plurality of supported languages that includes the particular language; comparing the one or more unknown-language data elements with one or more known-language data elements in said each set of known-language data elements to determine corresponding number of unknown-language data elements that matched for the corresponding supported language. 7. The method of claim 1 , wherein the comparing further comprises stemming the one or more unknown-language data elements to match with the plurality of known-language data elements. 8. The method of claim 1 , further comprising: based on the document schema of the document, selecting at least one unknown-language data element of the plurality of unknown-language data elements such that the at least one unknown-language data element has a data value that can vary in formats based on a locale of the document; based on a format of the data value, determining a locale confidence level value for the document. 9. The method of claim 8 , wherein the format of the data value is based at least on one of the following: a date format, a number format, or a currency value format. 10. The method of claim 1 , further comprising determining the threshold language value based on a maximum language confidence value possible for the document. 11. The method of claim 1 , further comprising determining the language threshold value based on a plurality of language confidence level values, for a plurality of languages, determined for the document that includes the language confidence level value. 12. The method of claim 1 , further comprising: automatically determining that a file that includes the document is compressed; in response to automatically determining that the file that includes the document is compressed, automatically decompressing the file to extract the document. 13. The method of claim 1 , further comprising: automatically determining that the document is encrypted; in response to automatically determining that the document is encrypted, automatically decrypting the document. 14. A data-processing method comprising: using a first computer, obtaining from one or more non-transitory computer-readable data storage media a copy of one or more sequences of instructions that are stored on the media and are arranged, when executed using a second computer among a plurality of other computers to cause the second computer to perform: using a computer, receiving an electronic document comprising a plurality of unknown-language data elements each associated with one or more types; using the computer, based on a document schema of the document, selecting one or more unknown-language data elements from the plurality of unknown-language data elements; using the computer, assigning to each of the one or more unknown-language data elements a corresponding weight value based on a respective type of the unknown-language data element; using the computer, comparing the one or more unknown-language data elements with a plurality of known-language data elements that are associated with the document schema; using the computer, based on the comparing, determining a number of unknown-language data elements in the one or more unknown-language data elements that matched any in a subset of the plurality of known-language data elements, wherein the subset of known-language data elements corresponds to a particular language; using the computer, based on the number of unknown-language data elements in the one or more unknown-language data elements that matched to the subset of known-language data elements and based on the corresponding weight value assigned to each unknown-language data element in the number of unknown-language data elements, determining a language confidence level value specifying a level of machine confidence that the document is expressed in the particular language; using the computer, based on the language confidence level value for the particular language exceeding a language threshold value, automatically processing the document using the particular language. 15. The method of claim 14 , further comprising: receiving the document as part of receiving a request to process the document, the request comprising one or more additional data elements; selecting an additional data element that indicates possible language for the request, the additional data element assigned to a particular weight; based on a data value

Assignees

Inventors

Classifications

  • Search customisation based on user profiles and personalisation · CPC title

  • Recognition of textual entities · CPC title

  • Indexing, e.g. XML tags; Data structures therefor; Storage structures · CPC title

  • Coding or compression of tree-structured data · CPC title

  • Distributed file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9858258B1 cover?
Automatic locale determination for documents is described. In an embodiment, a computer server receives an electronic document comprising a plurality of unknown-language data elements each associated with one or more types. Based on a document schema of the document, the computer system selects one or more unknown-language data elements from the plurality of unknown-language data elements and a…
Who is the assignee on this patent?
Coupa Software Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/263. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).