A client in the mobile application space was looking to improve the usage of one of their core features, a payment splitter. This features allowed users to split bills and request payment from other members on the app. The issue was that the system to request payments was tedious and the user experience was extremely cumbersome if splitting many items. Our client asked us to build them a “receipt parser” which could take take in a photo of a receipt, extract all the unique line items, separate the tax and the tip, and integrate all of this into their current mobile application.
We built a multi-model Computer Vision pipeline that takes an image of a receipt as input and outputs an itemization of all charges tagged with applicable purchase categories.
We utilized the latest model architectures proposed in recent research papers to create incredibly high performing Optical Character Recognition performance in the first step of the pipeline. When chaining models, it’s critical to minimize the error early in the pipeline as much as possible so as to avoid compounding error down the chain. Further, we utilize a geometric model to associate items with their costs and quantities, as well as correct for variables such as glare and image skew.
However, this problem is more than just OCR. In order to provide a service useful to customers in the context of an application we add another modeling layer: item classification. We leverage natural language embeddings to classify line items into predefined general categories such as food, clothing, automotive, etc. with each category having subcategories such as meat, baked goods, etc.
Our AI based solution reduced the tremendous amount of the manual user data entry the old system required and even “really increased our cool factor” as our clients put it. The new system saw an average reduction in time required to use the feature by 67%, while simultaneously increasing feature usage by over 55%.