Showing posts with label healthcare. Show all posts
Showing posts with label healthcare. Show all posts

Saturday, May 13, 2017

Why Data Science Can't Find the Needle in the Haystack

A colleague of mine wants a predictive model. He is trying to determine which people on a health insurance plan will visit the hospital in the next couple of months. He has pretty good data for making this type of prediction. He knows who visited the hospital in the past; and he knows they are more likely to revisit. He knows what illnesses these people have, and which illnesses likely result in hospital visits. He knows what drugs have been prescribed, and whether patients are taking their drugs.
Even with this data and even with very good models, he still complains that the predictions are not good enough. His problem is too many false positives. And he simply doesn’t have enough employees to review every patient the model predicts.

Data science is a great tool, but it is not perfect. And if you are going to weld the data science tool, you should be aware its’ shortcomings. Data science is simply not very good at finding a needle in a haystack.

This concept can be illustrated with an example. Say a banker is managing the mortgages of 500,000 homeowners. He knows from experience that roughly 1,000 of these homeowners will default on their loan. There is plenty of data to help zero in on these 1,000 people: zip code, income, payment history, and credit rating. He knows that if I can put the right people some assistance, they may not default on their loan.

We have the data of build a predictive model. But regardless of how good the model is, it will not be perfect. When the model is run, each homeowner will be classified as at-risk for default or not at-risk. In this scenario, there are four possible outcomes for each homeowner. The homeowner is at-risk and is properly identified by the model; the homeowner is not at-risk is properly identified by the model. These are the two accurate predictions.

Every predictor gets some wrong too. When the homeowner is not at-risk, but the model says he is, that is a false positive. If the homeowner is at-risk and was not identified by the model, that is a false negative. In statistics, false positives are also referred to as “type I errors”. False negatives are referred to as “type II errors”.

Now let’s say that we construct a model that is 80% accurate, which is a rule-of-thumb threshold for a good prediction. With 80% accuracy on 500,000 loans, 400,00 will be correctly predicted and 100,000 will not. At an 80% prediction rate, 800 of the 1,000 homeowners would be predicted correctly.

Of course, an 80% success rate means there is also a 20% failure rate. I mentioned that 100,000 are not predicted correctly. There are 200 false negatives that are the balance of the 1,000 target loans. Subtracting the 200 false negatives from the 100,000-people identified incorrectly leaves 99,800 false positives. That is the extraordinary 500 times as many false positives as correctly predicted at-risk loans.

Even if the model can predict at the incredible rate of 99% the numbers of false-positives will outnumber the correctly identified at-risk cases by nearly five to one.


The problem here isn’t with data science or prediction methods. It simple math. When trying to use statistics to find a very small number among a very large number the false-positives will always greatly outnumber the actual positive prediction. This type of problem is truly a needle in a haystack.

Wednesday, February 18, 2009

Don’t think of BI as an all-in-one solution

I recently had a short conversation on Business Intelligence with one of my peers. I tried to explain the premise that a Business Intelligence application in our industry (Health Care) should not be a one-size-fits all solution. Instead the technology should be tailored to the types of questions that it will need to answer most frequently.

When he claimed "of course it has to be able to answer any question, otherwise we could just write queries," I realized I failed to make my point. He is not the first person I've met to have this opinion. In fact, the opinion is pretty pervasive among my peers; and it is wrong.

Our Business Intelligence solution is a textbook case highlighting this point. Its' saga is a story for another day, but in our attempt to make it very flexible we failed to make it strong. That is, there's no limit to the reports you can create, but it's not great at answering any particular question.

I liken this to the difference between a hammer and a Swiss Army Knife. A hammer is great at driving nails, better than any other tool for this task. It also happens to be pretty good at removing nails too. A Swiss Army knife can do a lot of things from clipping nails to opening cans. But it's not particularly good at any of them.

The real beauty of a hammer, though, is the other things that it can do pretty well. In fact, if put to the test, it isn't hard to come up with at least as many tasks as can be done with a hammer as with the Swiss Army Knife. It can be a door-stop, a paper-weight, a meat tenderizer, a garden shovel, and more. Sure, it's lousy at tightening screws, but it can really drive nails.

Don't get me wrong. There's a place for firms that build all-in-one software. In fact, my former employer Information Builders is one such firm. My current BI solution is built on a MicroStrategy platform, another Swiss Army Knife vendor. Vertical solutions, like our health care application, need to be targeted; they need to be really good at particular questions.

Unfortunately, many designers of Business Intelligence solutions try to make Swiss Army Knives when they really need hammers. And given a good hammer and an innovative user, there will soon be many other tasks suitable for the tool.

Wednesday, February 04, 2009

Natural oraffice surgey; maybe it's cool, but it gives me the creeps

Apparently Johns Hopkins research doctors have successfully removed a kidney through, um, the donor's oraffice. It's called "transvaginal nephrectomy" or, more broadly, "natural orifice" surgery. In the case of us guys, the idea would be to use the rectum.

The procedure reduces external damage or injury to the donor. The necessary incisions are smaller. Recovery time is quicker. And pain in reduced.

While the benefits seems positive, it still gives me the "willies" to think that someday I could donate a kidney out of my butt. 

You might also like ...