TLDR: How do we finely control the fairness of machine learning systems? In our AISTATS 2019 paper, we introduce a theoretically grounded method for learning controllable fair representations. Using our method, a party who is concerned with fairness (like a data collector, community organizer, or regulatory body) can convert data to representations with controllable limits on unfairness, then release only the representations. This controls how much downstream machine learning models can discriminate.
Machine learning systems are increasingly used during highstakes decisions, influencing credit scores, criminal sentences, and more. This raises an urgent question: how do we ensure these systems do not discriminate based on race, gender, disability, or other minority status? Many researchers have responded by introducing fair machine learning models that balance accuracy and fairness; but this leaves it up to institutions — corporations, governments, etc. — to choose to use these fair models, when some of these instutitions may be agnostic or even adversarial to fairness.
Interestingly, some researchers have introduced methods for learning fair representations ^{1}. Using such methods, a party who is concerned with fairness (like a data collector, community organizer, or regulatory body) can convert data to fair representations, then release only the representations, making it much more difficult for any downstream machine learning models to discriminate.
In this post, we introduce a theoretically grounded approach to learning fair representations, and we discover that a range of existing methods are special cases of our approach. Additionally, we note that all existing methods for learning fair representations can be said to balance usefulness and fairness, producing somewhatusefulandsomewhatfair representations. The concerned party must then run the learning process many times until they find representations they find satisfying. Based on our theoretical approach, we introduce a new method where the concerned party can control the fairness of representations by requesting specific limits on unfairness. Compared to earlier fair representations, ours can be learned more quickly, are able to satisfy requests for many notions of fairness simultaneously, and contain more useful information.
A theoretical approach to fair representations
We assume we are given a set of data points (), typically representing people, and their sensitive attributes (), typically their race, gender, or other minority status. We must learn a model () mapping any data point to a new representation (). Our goal is twofold: our representations should be expressive — containing plenty of useful information about the data point; and our representations should be fair — containing limited information about the sensititve attributes, so it is difficult to discriminate downstream ^{2}. Note that merely removing sensitive attributes (e.g. race) from the data would not satisfy this notion of fairness, as downstream machine learning models could then discriminate based on correlated features (e.g. zipcode) — a practice called “redlining”.
First, we translate our goal into the information theoretical concept of mutual information. The mutual information between two variables is formally defined as the KullbackLeibler divergence between the joint probability of the variables and the product of the marginal probabilities of the variables: ; intuitively, it’s the amount of information that is shared. Our goals can be made concrete as follows:
 To achieve expressiveness, we want to maximize the mutual information between the data point and the representation conditioned on the sensitive attributes : . (By conditioning on the sensitive attributes, we make sure we do not encourage information in the data point that is correlated with the sensitive attributes to appear in our representation.)
 To achieve fairness, we want to limit the mutual information between the representation and the sensitive attributes : , where has been set by the concerned party.
Next, because both mutual information terms are difficult to optimize, we need to find approximations:
 Instead of maximizing , we maximize a lower bound , which relies on us introducing a new model . Intuitively, maximizing encourages a mapping such that the new model that takes the representation plus the sensitive attributes can successfully reconstruct the data point .
 Instead of constraining , we can constrain an upper bound . Intuitively, constraining discourages complex representations.
Or, we can alternatively constrain , a tighter approximation of , which relies on us introducing a new model . Intuitively, constraining discourages a mapping where the new model that takes the representation is able to reconstruct the sensitive attributes .
Putting it all together, our final objective is to find the models , , and that encourage the successful reconstruction of the data points, while constraining the complexity of the representations, and constraining the reconstruction of the sensitive attributes:
Our “hardconstrained” objective for learning fair representations 

, 
where and are limits that have been set by the concerned party.
This gives us a principled approach to learning fair representations. And we are rewarded with a neat discovery: it turns out that a range of existing methods for learning fair representations optimize the dual — a “softregularized” version — of our objective!
The “softregularized” loss function for learning fair representations 

Existing methods  The they use  The they use 

Zemel et al. 2013^{3}  
Edwards and Storkey 2015^{4}  
Madras et al. 2018^{5}  
Louizos et al. 2015^{6} 
We see that our framework generalizes a range of existing methods!
Learning controllable fair representations
Let’s now take a closer look at the “softregularized” loss function. It should feel intuitive that existing methods for learning fair representations produce somewhatusefulandsomewhatfair representations, with the balance between expressiveness and fairness controlled by the choice of s. If only we could optimize our “hardconstrained” objective instead; then the concerned party could instead set to request specific limits on unfairness . . .
Luckily, there’s a way! We introduce:
Our loss function for learning controllable fair representations 

Intuitively, this loss function dictates that whenever we should be concerned about unfairness because or , the s will place additional emphasis on the unsatisfied constraint; this additional emphasis will persist until and return to satisfying the limits set by the concerned party. The rest of the time, when and are safely within the limits, minimizing will be prioritized, encouraging expressive representations.
Results
With this last piece of the puzzle in place, all that’s left to do is evaluate whether our theory leads to learning controllable fair representations in practice. To evaluate, we learn representations of three realworld datasets:
 the UCI German credit dataset of 1,000 individuals, where the binary sensitive attribute
age<50
/age>50
was to be protected  the UCI Adult dataset of 40,000 adults from the US Census, where the binary sensitive attribute
Man
/Woman
was to be protected^{7}  and the Heritage Health dataset of 60,000 patients, where the sensitive attribute to be protected was the intersection of age and gender: the agegroup (of 9 possible agegroups) the gender (
Man
/Woman
)^{8}
Sure enough, our results confirm that, in all three sets of learned representations, the concerned party’s choices for and control the approximations of unfairness and .
Our results also demonstrate that, compared to existing methods, our method can produce more expressive representations.
And our method is able to take care of many notions of fairness simultaneously.
(higher is expressive)  (lower is fairer)  (lower is fairer)  (lower is fairer)  (lower is fairer)  

constraints  < 10  < .1  < .1  < .1  
our method  9.94  9.95  0.08  0.09  0.04 
existing methods  9.34  9.39  0.09  0.10  0.07 
While these last two results may seem surprising, they occur because existing methods require the concerned party to run the learning process many times until they find representations they find roughly satisfying, while our method directly optimizes for the representations that are as expressive as possible while equally satisfying all of the concerned party’s limits on unfairness of the representations.
Takeaways
To complement fair machine learning models that corporations and governments can choose to use, this work takes a step towards putting control of fair machine learning in the hands of a party concerned with fairness, such as a data collector, community organizer, or regulatory body. We contribute a theoretical approach to learning fair representations that make it much more difficult for downstream machine learning models to discriminate, and we contribute a new method that allows the concerned party to control the fairness of the representations by requesting specific limits on unfairness, .
When working on fair machine learning, it is particularly important to acknowledge limitations and blindspots; or we risk building toy solutions, while overshadowing others’ work towards equity. A major limitation of our work is that the concerned party’s limits an appoximation of unfairness, and we hope that future work can go further and map to formal guarantees about the fairness of downstream machine learning. Another potential limitation of this work is that we, like much of the fair machine learning community, center demographic parity, equality of odds, and equality of opportunity notions of fairness. We believe that future work will need to develop deeper connections to socialjusticeinformed notions of equity if it is to avoid shallow technosolutionism and build more equitable machine learning^{9}.
This post is based on our AISTATS 2019 paper:
Learning Controllable Fair Representations
Jiaming Song*, Pratyusha Kalluri*, Aditya Grover, Shengjia Zhao, Stefano Ermon

Madras, David, Elliot Creager, Toniann Pitassi, and Richard Zemel. “Learning Adversarially Fair and Transferable Representations.” In ICML, 2018. ↩

For conciseness, we focus on demographic parity, a pretty intuitive and strict notion of fairness, but our approach works with many notions of fairness, as shown in our results. ↩

Zemel, Rich, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. “Learning Fair Representations.” In ICML, 2013. ↩

Edwards, Harrison, and Amos Storkey. “Censoring Representations with an Adversary.” In ICLR, 2015. ↩

Madras, David, Elliot Creager, Toniann Pitassi, and Richard Zemel. “Learning Adversarially Fair and Transferable Representations.” In ICML, 2018. ↩

Louizos, Christos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. “The Variational Fair Autoencoder.” In ICLR, 2016. ↩

Gender is not binary, and the treatment of gender as binary when using these datasets is problematic and a limitation of this work. ↩

Gender is not binary, and the treatment of gender as binary when using these datasets is problematic and a limitation of this work. ↩

For more on this, we strongly recommend reading “A People’s Guide to AI” by Mimi Onuoha and Mother Cyborg. Allied Media Projects. 2018. ↩