Dashboard Dangers 5: Don't Discriminate!
Thank you to everyone that read through the prior posts in this series. In case you haven't read the prior posts, you can find them here:
The purpose of this post is to discuss an important topic, discrimination. It's really important to be on the lookout for discrimination in any analytical models that you build.
There are two types of discrimination that you should watch out for in your analytical models, explicit discrimination and accidental/implicit discrimination. It's really important to be on the lookout for discrimination because your models can be in violation of local or federal laws. So let's explain what we mean by discrimination:
Explicit Discrimination - This probably does not need explanation so I will keep it quick. Let's say I go to a bank and apply for a mortgage. The bank runs a series of algorithms to determine if I am credit worthy. They cannot deny me based on my race, gender, age, religion, and several other factors. Click here to see the full list of factors. They can offer me a different interest rate based on my credit rating, but they cannot discriminate based on the above factors.
Accidental/Implicit Discrimination - Ok, I understand explicit discrimination but what do you mean by accidental discrimination? How do I accidentally discriminate against someone based on their race? We live in a world that is ever increasing in its consumption of data. As we get more data, we look to make even more connections to solve business problems. There isn't enough manpower for humans to analyze all this data, so we outsource the analysis to machines. We build machine learning algorithms to train computers how to find connections between data. The computers are able to work 24/7 and connect so many dots and make correlations we didn't even know existed.
Let's take the same scenario from before. Suppose, I apply for a mortgage at a bank. The bank runs my information through the computer and the computer determines I am not worthy of a mortgage. Suppose that algorithm analyzed billions of data points and determined that residents in XYZ city have a 95% default rate. As a result, the bank denies my application. So, what's the problem with this? Banks can discriminate against you based on the city you live in! You're absolutely correct, banks can discriminate against you based on the city you live in. However, suppose that city is predominantly one race. If the bank discriminates against residents in a city that is predominantly one race, then they indirectly discriminated against the applicant based on race, which violates the FDIC regulations. That's why it's very important to understand what connections your algorithms are making and that safeguards against discrimination are built into the models.