Elder Research has extensive experience helping enterprises find and eliminate fraud, waste and abuse (FWA). For several state-level labor departments, we have assisted with identifying fraudulent unemployment insurance (UI) claims. This capability is critical today, during the unfortunate record-setting unemployment over the last few months. This blog is dedicated to help agencies – who are wary of domestic and foreign abusers of the UI system – by describing our successful work, where and how it is being used, and how to leverage the lessons learned and tools built to identify FWA.
The Tool – DAPM
The Data Analytics and Predictive Modeling (DAPM) tool combines business rules, mathematical algorithms, and predictive machine learning models to evaluate claims data and assess the risk of overpayment. Because the DAPM tool is designed to be adoptable by any state workforce agency, it uses generally-available UI data sets – claims data, certifications data, employment and income data, etc. – with the vision that states adopting the tool could first implement the core product and then build out functionality specific to their UI landscape. The analytics generated by DAPM are delivered using Elder Research’s RADR, a powerful, server-based, data analytics product fusing data from multiple sources, supporting sophisticated predictive and machine learning risk models, and providing a flexible and intuitive visual interface.
The DAPM analytical modules can be classified into four broad groups:
- IP address-based modules (IP Location, Changing IP, Sharing IP, Travel Risk, Employer IP Sharing Risk)
- Associations to previously flagged claimants module
- Report-based modules (Quarterly Wage Report, National Directory of New Hires)
- Supervised models (k-Nearest Neighbors (kNN), Support Vector Machine, and Random Forest)
Each of these modules generates a risk score for each claimant that is aggregated and normalized to a scale of 0 to 100, with low scores meaning it is less likely to be an overpayment and higher scores meaning it is more likely to be an overpayment.
Performance
Gain charts are used to assess the performance of this scoring process. Gain charts measure how quickly a model identifies overpayments by charting the percentage of certifications in the validation data set that would need to be examined by subject matter experts (SME) in order to detect a given percentage of the overpayments in the data set. For example, a model that identifies 25% of overpayments after examining only 5% of certifications would be extremely useful – such a model would identify overpayments at five times the baseline rate.
The gain charts contain three lines: the performance of the risk-scoring model (labeled “Model”), the baseline performance (“Random”), and the best performance possible if the model knew exactly which certifications were overpaid (“Wizard”). The charts below reflect a sample of results including one that focuses on the highest-scoring 5% of claimants. The top 5% to 10% of claimants is typically of greatest interest because they represent claims with the highest risk – and are those most worth examining given limited resources. Depending on the deployment, results will vary, and a combination of model tuning and adding other modules can hone results for a specific agency’s requirements.
Delivery
Elder Research built Extract, Load, and Transform (ETL) pipelines to pull the required data from an agency’s data warehouse and prepare it for analytic analysis, and configured RADR to allow analysts to view the results and work cases. The pipelines were automated to pull data on a periodic basis, update the model scores, and push the updates to RADR to provide analysts with fresh case scores. The system was deployed on internal agency servers but could be deployed in a cloud environment as well.
The RADR configuration developed allows investigators to easily view the claim risk scores and can easily be altered to change or add views for a specific need. The current configuration supports several views:
- Claimant Score Listings
- Aggregation of Claimants Across Key Attributes
- Employer
- IP Address
- Phone Number
- Claimant Details
- Includes a graph to associate attributes to other claimants
Claimant Score Listing
Aggregation – IP Address Example
Claimant Detail – Summary
Claimant Detail – Graph
Leveraging DAPM
Given the recent influx of UI claims, we are working with several state agencies to enhance the current deployment and demonstrate how DAPM can improve their analyst’s efficiency. The source code required for implementing the DAPM ETL pipeline and models have been released as open source. While a full implementation will rely on agency-specific data, this solution provides the framework for the pipeline and training. Elder Research can be consulted to support, install and/or augment these models for specific requirements.