Researchers Detect Medicare Deception by Programing Computers to Detect Fraud

image_pdfimage_print

Twenty percent of health care spending in the United States comes from Medicare, the primary health care coverage for Americans 65 and older. Rumors abound as to how much fraud exists in Medicare. Authorities estimate that yearly about $19 billion to $65 billion is lost to Medicare fraud, waste or abuse.

Human auditors or investigators painstakingly check thousands of Medicare claims manually for specific patterns that may indicate foul play or fraudulent behaviors. The U.S. Department of Justice reports that today’s fraud enforcement efforts depend mostly on health care professionals revealing information about Medicare fraud.
 
The journal Health Information Science and Systems recently published a study which is the first to employ advanced data analytics and machine learning with big data from Medicare Part B. The study’s aim was automating the fraud detection process.
 
Machine learning is a branch of artificial intelligence based on the idea that computer systems can learn from data and identify patterns. In this study computers were programmed to predict, classify and flag potential fraudulent events. This approach could significantly improve fraud detection. Providers could lighten the workload for auditors and investigators.
 
Researchers from Florida Atlantic University’s (FAU’s) Department of Computer and Electrical Engineering and Computer Science examined the Medicare Part B dataset from 2012 to 2015. They focused on detecting fraudulent provider claims within the dataset of 37 million cases. Cases labeled as “fraud” include patient abuse or neglect and billing for services not rendered. Physicians and other providers who commit fraud cannot participate in federal health care programs like Medicare.
 
The researchers aggregated the 37 million cases down to a smaller dataset of 3.7 million. They devised a unique process to map fraud labels with known fraudulent providers. Medicare Part B data includes provider information, average payments and charges, procedure codes, number of procedures and the medical specialty, known as the provider type.
 
To obtain exact matches, the researchers used the National Provider Identifier (NPI) — NPIs are issued by the federal government to health care providers — to match fraud labels to the Medicare Part B data. Researchers directly matched the NPI across the Medicare Part B data, flagging any provider in the “excluded” database as being “fraudulent.” They classified a physician’s NPI or specialty and determined whether the predicted specialty differed from the actual specialty, as indicated in the Medicare Part B data. 
 
Taghi M. Khoshgoftaar, Ph.D., co-author and Motorola Professor in FAU’s Department of Computer and Electrical Engineering and Computer Science explains, “For example, if a dermatologist is accurately classified as a cardiologist, then this could indicate that particular physician is acting in a fraudulent or wasteful way.”
 
For the study, Khoshgoftaar, senior author Richard A. Bauder, Ph.D., student and data scientist at FPL and their Ph.D. student collaborators had to address the high imbalance of the original labeled big dataset. This occurred because non-fraudulent providers far outnumber fraudulent providers. This is problematic for machine learning approaches. The algorithms attempt to distinguish between the classes, but one dominates the other and fools the learner.
 
The researchers solved this problem by using random undersampling, reducing the dataset from 3.7 million cases down to about 14,000 cases (for the best detection results). They created seven class distributions and used six different learners across class distributions from severely imbalanced to balanced. The learning algorithm RF100 (Random Forest) was the best at detecting the positives of potential fraud events. Interestingly, keeping more of the non-fraud cases helped the learner/model better distinguish between the fraud and non-fraud cases. The researchers found the “sweet spot” for detecting Medicare fraud to be a 90:10 distribution of normal vs. fraudulent data.
 
“Our goal is to enable machine learners to cull through all of this data and flag anything suspicious. Then, we can alert investigators and auditors who will only have to focus on 50 cases instead of 500 cases or more,” says Bauder.
 
Khoshgoftaar explains, “The goal is to build a predictive model and create a better methodology for federal auditors. The fact that we are the first to use big data to uncover Medicare fraud is very important. If you are lucky enough to be the first, other researchers come to you. So far we have touched only the surface of this problem, really just uncovered the tip of the iceberg.”
 
This detection method also has applications for other types of fraud including insurance, banking and finance. The researchers are currently adding other Medicare-related data sources such as Medicare Part D.
 
Dean of FAU’s College of Engineering and Computer Science, Stella Batalama, Ph.D., foresees further impacts this research may have. “The methodology being developed and tested in our college could be a game changer for how we detect Medicare fraud and other fraud in the United States as well as abroad.”

Researchers Detect Medicare Deception by Programing Computers to Detect Fraud

image_pdfimage_print

Twenty percent of health care spending in the United States comes from Medicare, the primary health care coverage for Americans 65 and older. Rumors abound as to how much fraud exists in Medicare. Authorities estimate that yearly about $19 billion to $65 billion is lost to Medicare fraud, waste or abuse.

Human auditors or investigators painstakingly check thousands of Medicare claims manually for specific patterns that may indicate foul play or fraudulent behaviors. The U.S. Department of Justice reports that today’s fraud enforcement efforts depend mostly on health care professionals revealing information about Medicare fraud.
 
The journal Health Information Science and Systems recently published a study which is the first to employ advanced data analytics and machine learning with big data from Medicare Part B. The study’s aim was automating the fraud detection process.
 
Machine learning is a branch of artificial intelligence based on the idea that computer systems can learn from data and identify patterns. In this study computers were programmed to predict, classify and flag potential fraudulent events. This approach could significantly improve fraud detection. Providers could lighten the workload for auditors and investigators.
 
Researchers from Florida Atlantic University’s (FAU’s) Department of Computer and Electrical Engineering and Computer Science examined the Medicare Part B dataset from 2012 to 2015. They focused on detecting fraudulent provider claims within the dataset of 37 million cases. Cases labeled as “fraud” include patient abuse or neglect and billing for services not rendered. Physicians and other providers who commit fraud cannot participate in federal health care programs like Medicare.
 
The researchers aggregated the 37 million cases down to a smaller dataset of 3.7 million. They devised a unique process to map fraud labels with known fraudulent providers. Medicare Part B data includes provider information, average payments and charges, procedure codes, number of procedures and the medical specialty, known as the provider type.
 
To obtain exact matches, the researchers used the National Provider Identifier (NPI) — NPIs are issued by the federal government to health care providers — to match fraud labels to the Medicare Part B data. Researchers directly matched the NPI across the Medicare Part B data, flagging any provider in the “excluded” database as being “fraudulent.” They classified a physician’s NPI or specialty and determined whether the predicted specialty differed from the actual specialty, as indicated in the Medicare Part B data. 
 
Taghi M. Khoshgoftaar, Ph.D., co-author and Motorola Professor in FAU’s Department of Computer and Electrical Engineering and Computer Science explains, “For example, if a dermatologist is accurately classified as a cardiologist, then this could indicate that particular physician is acting in a fraudulent or wasteful way.”
 
For the study, Khoshgoftaar, senior author Richard A. Bauder, Ph.D., student and data scientist at FPL and their Ph.D. student collaborators had to address the high imbalance of the original labeled big dataset. This occurred because non-fraudulent providers far outnumber fraudulent providers. This is problematic for machine learning approaches. The algorithms attempt to distinguish between the classes, but one dominates the other and fools the learner.
 
The researchers solved this problem by using random undersampling, reducing the dataset from 3.7 million cases down to about 14,000 cases (for the best detection results). They created seven class distributions and used six different learners across class distributions from severely imbalanced to balanced. The learning algorithm RF100 (Random Forest) was the best at detecting the positives of potential fraud events. Interestingly, keeping more of the non-fraud cases helped the learner/model better distinguish between the fraud and non-fraud cases. The researchers found the “sweet spot” for detecting Medicare fraud to be a 90:10 distribution of normal vs. fraudulent data.
 
“Our goal is to enable machine learners to cull through all of this data and flag anything suspicious. Then, we can alert investigators and auditors who will only have to focus on 50 cases instead of 500 cases or more,” says Bauder.
 
Khoshgoftaar explains, “The goal is to build a predictive model and create a better methodology for federal auditors. The fact that we are the first to use big data to uncover Medicare fraud is very important. If you are lucky enough to be the first, other researchers come to you. So far we have touched only the surface of this problem, really just uncovered the tip of the iceberg.”
 
This detection method also has applications for other types of fraud including insurance, banking and finance. The researchers are currently adding other Medicare-related data sources such as Medicare Part D.
 
Dean of FAU’s College of Engineering and Computer Science, Stella Batalama, Ph.D., foresees further impacts this research may have. “The methodology being developed and tested in our college could be a game changer for how we detect Medicare fraud and other fraud in the United States as well as abroad.”