Skip to content
/ CUTIE Public
forked from vsymbol/CUTIE

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extrator)

Notifications You must be signed in to change notification settings

4kssoft/CUTIE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUTIE

TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohui Zhao ArXiv 2019

Results

Result evaluated on 4,484 receipt documents, including taxi receipts, meals entertainment receipts, and hotel receipts, with 9 different key information classes. (AP / softAP)

Method #Params Taxi Hotel
CloudScan - 82.0 / - 60.0 / -
BERT 110M 88.1 / - 71.7 / -
CUTIE 14M 94.0 / 97.3 74.6 / 87.0

Taxi

Hotel

Installation & Usage

pip install -r requirements.txt
  1. Generate your own dictionary with main_build_dict.py / main_data_tokenizer.py
  2. Train your model with main_train_json.py

CUTIE achieves best performance with rows/cols well configured. For more insights, refer to statistics in the file (others/TrainingStatistic.xlsx).

Chart

TLDR

For information about the input example, refer to issue discussion.

The project is refreshed with all history removed. All programs are runnable expect that the data example is not uploaded. Since the project was built in my previous workplace, the data format can not be uploaded without permission right now. However, you may infer the correct data format from the data_loader_json.py file. Pull request is welcomed for making the project runnable out of the box.

About

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extrator)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%