“Simple justice requires that public funds, to which all taxpayers of all races contribute, not be spent in any fashion which encourages, entrenches, subsidizes or results in racial discrimination.”

John F. Kennedy, 1963

This blog post is the second of a series about discrimination in law and it explains one form of discrimination known as disparate impact (see first part about disparate treatment). It is based on CM-604 Theories of Discrimination (Title VII) and chapters 6 and 7 of TITLE VI Legal Manual. This blog post assists researchers in identifying and mitigating discriminatory models by providing a parallel between proving discrimination in law and proving discrimination in machine learning.

Machine Learning Analogy
For each section, we give a brief history of related efforts in machine learning in a green box like this one!
Main Point
We write the main point for each section in a blue box like this one!

# Protected Attributes

Anti-discrimination laws are designed to prevent discrimination based on one of the following protected attributes:

# Definition

Disparate impact occurs when policies/practices that appear neutral result in a disproportionate impact on a protected group without any business necessity 2.

Business necessity is context dependent; for example, a university that hires professors can argue that having a Ph.D. for employment is necessary (although this requirement might cause a disproportionate impact for different races); however, the university cannot argue something like physical strength as a business necessity (which lead to disproportionate impact for different genders).

Disparate Impact in Machine Learning
Consider pedestrian detection models used in autonomous cars. Assume that the model prediction appears neutral and performs the same for black and white pedestrians in any situation. If there are more reported accidents with black pedestrians than the general population (e.g., the model performs poorly in black-dominated neighborhoods), and the developers cannot provide any business reason for this disparity then this can be considered as a disparate impact case.
In contrast, if the model fails to recognize a black pedestrian in a specific situation but recognizes a white pedestrian in the same situation (same place, same outfit, same time of the day, etc.), this can be considered a disparate treatment case.

# Legal Procedure

The legal procedure for proving disparate impact consists of three steps (figure below):

1. The charging party (the party that believes it has suffered from disparate impact) needs to show that a specific practice caused people in a protected group to be treated worse than people not in the protected class.
2. The respondent (the party that is accused of disparate impact, e.g., employer) attempts to show that it had a legitimate business reason for this specific practice.
3. If the business reason is legitimate, the charging party can show that the respondent could have achieved the same business goal by a less discriminatory method.

In a disparate treatment case, the charging party must only prove that the correspondent is using the protected attribute for its decision. On the other hand, in the disparate impact case, in addition to proving that there is a disparate impact, the charging party needs to show that there is no business necessity for the policy or there is an alternative that leads to the same performance but less disparate impact.

## Example3

A middle school has a “zero tolerance” tardiness policy. Late students must stay in the principal’s office for the rest of the class period regardless of their reason for tardiness.

Step 1: Proof of disparate impact: The evidence shows that Asian-American students are disproportionately losing instruction time under the school’s “zero tolerance” tardiness policy. A further investigation reveals that whites and Hispanics are more likely to live within walking distance of the school, while Asian-American students typically live farther away and must take public transportation. Even if they take the first bus available in the morning, they are often dropped off after school starts.

Step 2: Justification: As justification for the “zero tolerance” tardiness policy, the school articulates the goals of reducing disruption caused by tardiness and encouraging good attendance, all of which the federal funding agency accepts as important educational goals.

Step 3: Assessment of justification and alternatives: The plaintiff first assesses the justification— including whether the policy is reasonably likely to reduce tardiness for these students under these circumstances. If the justification is valid, the plaintiff would then investigate alternatives that would achieve the important educational goals while reducing the adverse impact on Asian-American students (e.g., aligning class schedules and bus schedules). If such alternatives are present, the school is violating Title VI.

## First Step: Charging Party Provides Proof of Disparate Impact

Proving disparate impact involves four substeps:

#### (a) Identifying the Facially Neutral Policy

As the first step, the charging party should Identify the policy or practice that allegedly caused disparate harm.

In the second step, the charging party should identify the kind of harm. Examples include fewer or inferior services or benefits, distribution of burdens and negative effects, threatened or imminent harm, etc.

#### (c) Establishing Disparity

In the third step, the charging party should show that a disproportionate share of the adversity/harm is on protected groups. For doing so, they first need to define the correct population base (individuals affected by the policy or who may be affected by changes to or elimination of the policy). Secondly, they should determine that the disparity is large enough to matter (i.e., is it sufficiently significant to establish a legal violation)4.

#### (d) Establishing Causation

In the last step, the charging party should show that the policy actually caused that effect. A disparate impact claim is not valid if the evidence shows that even without the challenged practice, the same disparate impact would have existed. In particular, the Supreme Court has emphasized that entities should not be “held liable for racial disparities they did not create.”5

Machine Learning Analogy for Step 1: Proof of Disparate Impact
We now briefly explain efforts in ML for each step:
Identifying the policy: ML researchers usually investigate a specific ML application (e.g., a face recognition model). However, there are tasks where model predictions do not deterministically define the final decisions. For example, in risk assessment methods in criminal justice, the model’s scores are shown to the judges to decide, thus it is not clear how it changes the judge’s decision 6. It is important to pinpoint the exact policy for investigation.
Defining harm: There have been many proposals to define the harms of ML models (e.g., false-positive parity 7, demographic parity 8, parity in amount of actions that need to be done to flip the decision 9).
Establishing disparity: ML researchers usually assume they have access to the protected attributes.10 It is important to note that disparate impact (unlike disparate treatment) is very sensitive to the distribution of protected groups. The same policy could have significant disparate impact in one location but not another! Therefore, exactly understanding who is impacted by the model’s decision is crucial for this step.
Establishing causation: As explained in step 1, it is a common practice to only investigate an ML model in isolation. In this case, model prediction is a causal outcome of the model itself! However, we should be careful when we study a process that is caused by interaction between humans and ML models. For example, in hiring through Linkedin, discrimination can happen because of the ML model recommendation or biased recruiters.
Main point of Step 1: Proof of Disparate Impact
Proving disparate impact involves four steps: 1) identifying the policy, 2) establishing harms, 3) establishing disparity, and 4) establishing causation. All of these components in both ML and law are hard to define and are highly context-dependent.

## Second Step: Respondent Provides Legitimate Business Reasons (justifications)

In this step, the respondent should articulate a “substantial legitimate justification” for the challenged policy or practice. Agencies should thoroughly investigate the facts to determine whether these rationales are supported by sufficient evidence.

Example11. Alexander v. Sandoval, 532 U.S. 275 (2001). James Alexander, Director of the Alabama Department of Public Safety, ordered that the test for Alabama driver’s license test needs to be done in English. Martha Sandoval sued Alexander and claimed that the English-only test policy was discriminatory. The state agency offered several justifications for the English-only rule: highway safety concerns, exam administration difficulties, exam integrity, and budgetary constraints.

The district court found that the recipient had produced no evidence at trial that non-English speakers posed a greater driving safety risk than English speakers; the recipient had undermined its own safety argument by recognizing valid licenses from non-English speakers of other locales; making test accommodations for illiterate, deaf, and disabled drivers; and having previously offered the examination in fourteen languages without administrative difficulty.

### Sex Discrimination

“CSXT Transportation conducted isokinetic strength testing as a requirement for workers to be hired for various jobs. The EEOC said that the strength test used by CSXT, known as the “IPCS Biodex” test, caused an unlawful discriminatory impact on female workers seeking jobs as conductors, material handler/clerks, and a number of other job categories. The EEOC also charged that CSXT used two other employment tests, a three-minute step test seeking to measure aerobic capacity and a discontinued arm endurance test, as a requirement for selection into certain jobs, and that those tests also caused an unlawful discriminatory effect on female workers.”

# Other Types of Discrimination

Title VII (Theories of discrimination) prohibited 5 kinds of discrimination in employment. Our previous post covered disparate treatment; this post covers disparate impact; for completeness, we will briefly mention other types of discrimination here.

## Perpetuation of Previous Discriminations

This kind of discrimination occurs when a neutral employment system continues to perpetuate the effects of past discrimination. For proving the perpetuation of past discrimination, the charging party must establish a causal connection between the past discrimination and the current policy’s adverse effects.

Example. Jamal claims that GE refuses to hire Blacks as summer employees. GE contends that it gives preference to the children of employees. Prior to 1964, GE employed very few Blacks due to discriminatory hiring practices. The policy of giving a preference to the children of employees perpetuates GE’s past discriminatory practices.

Perpetuation of Previous Discrimination in Machine Learning
As there is almost no regulation regarding Machine learning models, these models can cause some discrimination that can get perpetuated. For example, assume company X gives access to its API only to the white majority and stores data from their interaction for training. As a result, the final model performs poorly for non-white groups. Company X can argue that collecting data from non-whites is very costly. This reason cannot be made since this behavior would be an instance of the perpetuation of previous discrimination.

## Accommodation

This type of discrimination occurs by failing to accommodate a prospective or existing employee’s disability or religious practices. The charging party can establish a case by showing that they informed the respondent of the accommodation, but the respondent fails to accommodate (note that there is no need to compare similarly situated individuals or show adverse impacts). In response, the respondent can show that accommodation would have created an undue hardship on the conduct of its business.

Accommodation in Machine Learning
The concept of accommodating in machine learning models is not immediately clear. Should features be chosen such that disabled people can provide data too? Should the task be defined in a way that can be applied to disabled people as well? Should the test data (for tasks such as pedestrian detection) report accuracy for disabled people (e.g., people with wheelchairs) separately? All of these could fall under the scope of discrimination wrt accommodation.

## Retaliation

Title VII prohibits discrimination against individuals because they have filed a Title VII charge, have participated in a Title VII investigation, or have otherwise opposed Title VII discrimination.

# Conclusion

Discrimination in law has been studied for over seven decades now. There are definitions and clear procedures for proving discrimination. The procedure involves two parties: (1) the charging party who tries to show discrimination has happened and (2) the respondent who tries to show that the charging party’s evidence is not valid. Understanding their process and the challenges will help ML researchers develop better methods to facilitate auditing models and develop mitigation methods.

# Acknowledgment

We would like to thank Alex Tamkin, Jacob Schreiber, Neel Guha, Peter Henderson, Megha Srivastava, and Michael Zhang for their useful feedback on this blog post.

1. https://www.nytimes.com/2020/06/15/us/gay-transgender-workers-supreme-court.html

2. Paraphrased from Fick, Barbara J. (1997). The American Bar Association guide to workplace law : everything you need to know about your rights as an employee or employer (1st ed.). New York: Times Books. ISBN 9780812929287

3. Example from page 8 in section 7 of TITLE VI LEGAL MANUAL

4. In many cases, courts have shied away from drawing clear lines. See Clady v. Cty. of Los Angeles, 770 F.2d 1421, 1428–29 (9th Cir. 1985); accord Smith v. Xerox Corp., 196 F.3d at 366 (“[T]he substantiality of a disparity is judged on a case-by-case basis.”); Groves, 776 F. Supp. at 1526 (“There is no rigid mathematical threshold that must be met to demonstrate a sufficiently adverse impact.”).

5. Inclusive Communities, 135 S. Ct. at 2523 (citing Wards Cove, 490 U.S. at 653)

6. Albright, Alex. “If you give a judge a risk score: evidence from Kentucky bail decisions.” Harvard John M. Olin Fellow’s Discussion Paper 85 (2019).

7. Hardt, Moritz, Eric Price, and Nati Srebro. “Equality of opportunity in supervised learning.” Advances in neural information processing systems 29 (2016): 3315-3323.  2

8. Kamiran, Faisal, and Toon Calders. “Data preprocessing techniques for classification without discrimination.” Knowledge and Information Systems 33.1 (2012): 1-33.

9. Milli, Smitha, et al. “The social cost of strategic classification.” Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019.

10. There are cases where the protected attributes are not known a priori. There are some methods to deal with disparity when protected attributes are not known, e.g., study the worst case group as a proxy for the protected groups in Hashimoto, Tatsunori, et al. “Fairness without demographics in repeated loss minimization.” International Conference on Machine Learning. PMLR, 2018 and Khani, Fereshte, Aditi Raghunathan, and Percy Liang. “Maximum weighted loss discrepancy.” arXiv preprint arXiv:1906.03518 (2019).

11. Example is taken from section 7 page 33 of Legal Manual of title VI

12. Pedreshi, Dino, Salvatore Ruggieri, and Franco Turini. “Discrimination-aware data mining.” Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008.

13. Proposed as outcome test [^becker] in 1959.

14. Corbett-Davies, Sam, and Sharad Goel. “The measure and mismeasure of fairness: A critical review of fair machine learning.” arXiv preprint arXiv:1808.00023 (2018).

15. Nushi, Besmira, Ece Kamar, and Eric Horvitz. “Towards accountable ai: Hybrid human-machine analyses for characterizing system failure.” Proceedings of the AAAI Conference on Human Computation and Crowdsourcing. Vol. 6. 2018.

16. Singla, Sahil, et al. “Understanding failures of deep networks via robust feature extraction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

17. Ribeiro, Marco Tulio, et al. “Beyond accuracy: Behavioral testing of NLP models with CheckList.” arXiv preprint arXiv:2005.04118 (2020).

18. D’Amour, Alexander, et al. “Underspecification presents challenges for credibility in modern machine learning.” Journal of Machine Learning Research (2020).  2

19. Khani, Fereshte, Aditi Raghunathan, and Percy Liang. “Maximum weighted loss discrepancy.” arXiv preprint arXiv:1906.03518 (2019).

20. Examples from https://www.digitalhrtech.com/disparate-treatment/