Data Science Can’t Fix Hiring (Yet)

Recruiting managers desperately need new tools, because the existing ones—unstructured interviews, personality tests, personal referrals—aren’t very effective. The newest development in hiring, which is both promising and worrying, is the rise of data science–driven algorithms to find and assess job candidates. By my count, more than 100 vendors are creating and selling these tools to companies. Unfortunately, data science—which is still in its infancy when it comes to recruiting and hiring—is not yet the panacea employers hope for.

Vendors of these new tools promise they will help reduce the role that social bias plays in hiring. And the algorithms can indeed help identify good job candidates who would previously have been screened out for lack of a certain education or social pedigree. But these tools may also identify and promote the use of predictive variables that are (or should be) troubling.

Because most data scientists seem to know so little about the context of employment, their tools are often worse than nothing. For instance, an astonishing percentage build their models by simply looking at attributes of the “best performers” in workplaces and then identifying which job candidates have the same attributes. They use anything that’s easy to measure: facial expressions, word choice, comments on social media, and so forth. But a failure to check for any real difference between high-performing and low-performing employees on these attributes limits their usefulness. Furthermore, scooping up data from social media or the websites people have visited also raises important questions about privacy. True, the information can be accessed legally; but the individuals who created the postings didn’t intend or authorize them to be used for such purposes. Furthermore, is it fair that something you posted as an undergrad can end up driving your hiring algorithm a generation later?

Another problem with machine learning approaches is that few employers collect the large volumes of data—number of hires, performance appraisals, and so on—that the algorithms require to make accurate predictions. Although vendors can theoretically overcome that hurdle by aggregating data from many employers, they don’t really know whether individual company contexts are so distinct that predictions based on data from the many are inaccurate for the one.

Yet another issue is that all analytic approaches to picking candidates are backward looking, in the sense that they are based on outcomes that have already happened. (Algorithms are especially reliant on past experiences in part because building them requires lots and lots of observations—many years’ worth of job performance data even for a large employer.) As Amazon learned, the past may be very different from the future you seek. It discovered that the hiring algorithm it had been working on since 2014 gave lower scores to women—even to attributes associated with women, such as participating in women’s studies programs—because historically the best performers in the company had disproportionately been men. So the algorithm looked for people just like them. Unable to fix that problem, the company stopped using the algorithm in 2017. Nonetheless, many other companies are pressing ahead.

The underlying challenge for data scientists is that hiring is simply not like trying to predict, say, when a ball bearing will fail—a question for which any predictive measure might do. Hiring is so consequential that it is governed not just by legal frameworks but by fundamental notions of fairness. The fact that some criterion is associated with good job performance is necessary but not sufficient for using it in hiring.

Take a variable that data scientists have found to have predictive value: commuting distance to the job. According to the data, people with longer commutes suffer higher rates of attrition. However, commuting distance is governed by where you live—which is governed by housing prices, relates to income, and also relates to race. Picking whom to hire on the basis of where they live most likely has an adverse impact on protected groups such as racial minorities.

Unless no other criterion predicts at least as well as the one being used—and that is extremely difficult to determine in machine learning algorithms—companies violate the law if they use hiring criteria that have adverse impacts. Even then, to stay on the right side of the law, they must show why the criterion creates good performance. That might be possible in the case of commuting time, but—at least for the moment—it is not for facial expressions, social media postings, or other measures whose significance companies cannot demonstrate.

In the end, the drawback to using algorithms is that we’re trying to use them on the cheap: building them by looking only at best performers rather than all performers, using only measures that are easy to gather, and relying on vendors’ claims that the algorithms work elsewhere rather than observing the results with our own employees. Not only is there no free lunch here, but you might be better off skipping the cheap meal altogether.