Apparatus and method to segment object from image
US-2017091951-A1 · Mar 30, 2017 · US
US10467508B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10467508-B2 |
| Application number | US-201815962514-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 25, 2018 |
| Priority date | Oct 6, 2015 |
| Publication date | Nov 5, 2019 |
| Grant date | Nov 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Font recognition and similarity determination techniques and systems are described. In a first example, localization techniques are described to train a model using machine learning (e.g., a convolutional neural network) using training images. The model is then used to localize text in a subsequently received image, and may do so automatically and without user intervention, e.g., without specifying any of the edges of a bounding box. In a second example, a deep neural network is directly learned as an embedding function of a model that is usable to determine font similarity. In a third example, techniques are described that leverage attributes described in metadata associated with fonts as part of font recognition and similarity determinations.
Opening claim text (preview).
What is claimed is: 1. In a digital medium environment to improve image font recognition through use of text localization, a method implemented by one or more computing devices comprising: obtaining a model, by the one or more computing devices, that is trained using machine learning as applied to a plurality of training images having text rendered using a corresponding font; predicting a bounding box, automatically and without user intervention by the one or more computing devices, for text in an image received using the obtained model by forming a plurality of cropped portions of the image and processing each of the plurality of cropped portions of the image by the model independently, one to another, the text overlapping a first and second cropped portion of the plurality of cropped portions; and generating an indication of the predicted bounding box by the one or more computing devices based on a result of the processing of each of the plurality of cropped portions of the image by calculating an average or a median of a top and bottom line of the predicted bounding box, the indication usable to specify a region of the image that includes the text having a font to be recognized. 2. The method as described in claim 1 , further comprising recognizing the font of the text in the received image by the one or more computing devices using the generated indication of the predicted bounding box. 3. The method as described in claim 1 , wherein the predicting includes processing each of the plurality of cropped portions of the image by a trained convolutional network of the model independently, one to another. 4. The method as described in claim 1 , wherein the predicting includes resizing the image by the one or more computing devices to correspond to an image size of the model. 5. The method as described in claim 1 , further comprising training the model by the one or more computing devices using the machine learning for a plurality of iterations. 6. The method as described in claim 5 , wherein the training is performed for at least one of the plurality of iterations using the plurality of training images having text rendered using the corresponding font and performed for one or more subsequent ones of the plurality of iterations in which one or more perturbations are introduced to the training images. 7. The method as described in claim 6 , wherein the perturbations includes at least one of noise, rotation, scale, shading, rotation, kerning, or cropping. 8. The method as described in claim 5 , wherein the machine learning is performed by the one or more computing devices using a convolutional neural network, the convolutional neural network is used as an architecture of the machine learning by the one or more computing devices and stochastic gradient decent is used as a training algorithm of the machine learning by the one or more computing devices. 9. The method as described in claim 1 , wherein the font to be recognized in the image is arbitrary such that the model is trainable without using the font. 10. In a digital medium environment to improve image font recognition through use of text localization, a system comprising: a text localization module implemented at least partially in hardware of at least one computing device to obtain a model that is trained using machine learning as applied to a plurality of training images having text rendered using a corresponding font; a machine learning module implemented at least partially in the hardware of the at least one computing device to predict a bounding box, automatically and without user intervention, for text in an image by forming a plurality of cropped portions of the image and processing each of the plurality of cropped portions of the image independently, one to another, the text overlapping a first and second cropped portion of the plurality of cropped portions; and the text localization module further implemented at least partially in the hardware of the at least one computing device to generate an indication of the predicted bounding box based on a result of the processing of each of the plurality of cropped portions of the image by calculating an average or a median of a top and bottom line of the predicted bounding box, the indication is usable to specify a region of the image that includes the text having a font to be recognized. 11. The system as described in claim 10 , wherein the font to be recognized in the image is arbitrary. 12. The system as described in claim 10 , further comprising a font similarity and recognition module implemented at least partially in the hardware of the at least one computing device to recognize the font of the text in the image. 13. The system as described in claim 10 , wherein the plurality of training images are organized as tuples to minimize a hinge loss function. 14. The system as described in claim 10 , wherein at least one training image of the plurality of training images is samples with a probability distribution that includes a normalization factor. 15. In a digital medium environment to improve image font recognition through use of text localization, a system comprising: means for obtaining a model that is trained using machine learning as applied to a plurality of training images having text rendered using a corresponding font; means for predicting a bounding box, automatically and without user intervention, for text in an image received using the obtained model by forming a plurality of cropped portions of the image and processing each of the plurality of cropped portions of the image by the model independently, one to another, the text overlapping a first and second cropped portion of the plurality of cropped portions; and means for generating an indication of the predicted bounding box based on a result of the processing of each of the plurality of cropped portions of the image by calculating an average or a median of a top and bottom line of the predicted bounding box, the indication is usable to specify a region of the image that includes the text having a font to be recognized. 16. The system as described in claim 15 , further comprising means for recognizing the font in the image. 17. The system as described in claim 15 , wherein the font to be recognized in the image is arbitrary such that the model is trainable without using the font. 18. The system as described in claim 15 , further comprising means for generating the predicted bounding box. 19. The system as described in claim 15 , wherein the plurality of training images are organized as tuples to minimize a hinge loss function. 20. The system as described in claim 15 , wherein at least one training image of the plurality of training images is samples with a probability distribution that includes a normalization factor.
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Classification techniques · CPC title
using neural networks · CPC title
Combinations of networks · CPC title
Distances to cluster centroïds · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.