Text Mining Audit Reports to Determine Grant Recipient Risk

The Challenge

The client needed to optimize strategies to fight fraud, waste, and abuse for federal grant applications. Grant recipients must undergo a Single Audit performed by an independent certified public accountant (CPA) as defined in Circular A-133 by the U.S. Office of Management and Budget. One of the key elements of an audit is the Findings section where independent auditors list where the auditee is not following best financial or government grant program practices and requirements. The project goal was to use text mining and machine learning to extract the independent CPA “Findings” from the reports and to use them to evaluate grant recipient risk.

The Solution

Elder Research partnered with Excella Consulting to build an end-to-end solution in the client’s AWS cloud. The solution involved data ingestion, unsupervised and supervised machine learning, and a powerful dashboard visualization and drill down tool based on Looker. We extracted approximately 12 million PDF pages (about five years of audits), performed text mining, and incorporated other structured data sources to assign risk scores to recipients. A model ensemble was used to classify pages during text mining. As a next step we are currently working on extracting and analyzing the text of each individual Finding.

Results

More than 260 auditors, investigators, evaluators, and lawyers now use the tool and it has helped launch or support eight audits in four different regions, three evaluations in three regions, and one major investigations project. This project has become one of the five most important initiatives of our client. The Precision-Recall curve of our page classification algorithm is shown below. The black line is a baseline Naive Bayes, and our CNN/RNN hybrid is in red. We exceeded our initial goal of 80% Precision and 95% Recall.

precision-recall

Download This Case Study