11institutetext: George Mason University
11email: [email protected], [email protected]
22institutetext: University of Pittsburgh Medical Center 33institutetext: Guangdong Provincial People’s Hospital 44institutetext: University of Notre Dame 55institutetext: University of Pittsburgh

Data-Algorithm-Architecture Co-Optimization for Fair Neural Networks on Skin Lesion Dataset

Yi Sheng 11    Junhuan Yang 11    Jinyang Li 11    James Alaina 22    Xiaowei Xu 33    Yiyu Shi 44    Jingtong Hu 55    Weiwen Jiang 11    Lei Yang 11
Abstract

As Artificial Intelligence (AI) increasingly integrates into our daily lives, fairness has emerged as a critical concern, particularly in medical AI, where datasets often reflect inherent biases due to social factors like the underrepresentation of marginalized communities and socioeconomic barriers to data collection. Traditional approaches to mitigating these biases have focused on data augmentation and the development of fairness-aware training algorithms. However, this paper argues that the architecture of neural networks, a core component of Machine Learning (ML), plays a crucial role in ensuring fairness. We demonstrate that addressing fairness effectively requires a holistic approach that simultaneously considers data, algorithms, and architecture. Utilizing Automated ML (AutoML) technology, specifically Neural Architecture Search (NAS), we introduce a novel framework, BiaslessNAS, designed to achieve fair outcomes in analyzing skin lesion datasets. BiaslessNAS incorporates fairness considerations at every stage of the NAS process, leading to the identification of neural networks that are not only more accurate but also significantly fairer. Our experiments show that BiaslessNAS achieves a 2.55% increase in accuracy and a 65.50% improvement in fairness compared to traditional NAS methods, underscoring the importance of integrating fairness into neural network architecture for better outcomes in medical AI applications.

Keywords:
AI-powered dermatology; Fairness; Neural Architecture Search.

1 Introduction

The democratization of AI is rapidly expanding the use of machine learning, notably neural networks, across various medical disciplines [34, 36], with dermatology leading due to the availability of comprehensive skin lesion datasets [9]. However, unlike general-purpose image datasets like ImageNet [18], skin lesion datasets often exhibit biases, particularly regarding skin tone. This imbalance poses a significant challenge for machine learning in dermatology, as it can result in models that, while accurate on average, perform poorly for underrepresented groups. Our analysis of the ISIC2019 dermatology dataset [5] revealed a notable accuracy disparity of over 10% between lighter and darker skin tones, despite an overall accuracy of 81.71% in Fig. 1(i). This issue of skin-type bias is not unique to academic datasets but is also prevalent in commercial AI applications, including facial-analysis tools [4] and Skin Image Search platforms [17].

Researches [21, 20] have highlighted that data bias significantly impacts the fairness of machine learning (ML) models. And Fig. 1(ii) shows that except data, algorithm and network also affect the fairness, and one observation from Table 2 shows that co-optimization of these factors yields the best performance. Through a comprehensive review of the ML process, we’ve found that neural architectures and training algorithms, alongside data, also influence fairness. Interestingly, these factors are interconnected, suggesting that optimizing them in isolation may not yield the most equitable outcomes. While previous studies have focused on enhancing fairness from data [32, 23] or algorithmic [7, 22, 8, 16, 26] perspectives, the role of neural architecture remains underexplored. Neural Architecture Search (NAS), which has gained attention for improving model performance and efficiency [15, 14, 25, 24], involves search space formulation, architecture evaluation, and optimizer evolution. This process offers a unique avenue to integrate data processing, training algorithms, and architecture search within a unified framework, yet fairness considerations have largely been overlooked in NAS, especially regarding biomedical data.

Refer to caption
Figure 1: Bias issue behind training dataset and three fairness-related factors

In response, this paper introduces Biasless-NAS, a comprehensive framework that leverages NAS for the co-optimization of data, training algorithms, and neural architecture. BiaslessNAS embeds fairness awareness into each phase of the NAS process, ensuring that these elements are simultaneously optimized for fairness in skin lesion dataset analysis. This approach not only addresses the gap in incorporating fairness into NAS but also sets a new standard for developing equitable ML models in biomedical applications. Experimental results show that BiaslessNAS can achieve the highest accuracy with a fairness improvement of 33.13%. With tolerant accuracy degradation, BiaslessNAS can find a fairer neural architecture with 65.59% fairness improvements.

2 Related Work

With the biased data in hand, traditional approaches can be divided into two directions: (1) data bias removal, and (2) fair training. Data bias removal: one way to remove the bias is by building a balanced dataset, however, it is a time-consuming process. An alternative way is to employ data augmentation. For example, [6] generates biased sets to increase the minority data artificially. In addition to data balance [11, 29], techniques were proposed to modify the training algorithms in addressing the fairness issue. Authors in [28, 19] applied adversarial training and add a discrimination module to improve fairness.

Our work stands at a different point to consider the neural architecture in addressing the fairness issue. We propose a framework to jointly optimize neural architectures, training algorithms, and data augmentation. The above-mentioned debiasing methods can be integrated into our framework.

3 Method

3.1 Fairness Metric Definition and Factor Investigation

Given a neural architecture N𝑁Nitalic_N and datasets T,D𝑇𝐷\langle T,D\rangle⟨ italic_T , italic_D ⟩ where T𝑇Titalic_T is the training set and D𝐷Ditalic_D is the validation set, N𝑁Nitalic_N is trained on T𝑇Titalic_T to generate the model fNsubscriptsuperscript𝑓𝑁f^{\prime}_{N}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, which is then validated on D𝐷Ditalic_D to obtain accuracy A(fN,D)𝐴superscriptsubscript𝑓𝑁𝐷A(f_{N}^{\prime},D)italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ). Fairness exists because data in D𝐷Ditalic_D have additional attributes (e.g., skin tones), which will divide D𝐷Ditalic_D into groups, denoted {Dg1,Dg2,,DgK}subscript𝐷subscript𝑔1subscript𝐷subscript𝑔2subscript𝐷subscript𝑔𝐾\{D_{g_{1}},D_{g_{2}},\cdots,D_{g_{K}}\}{ italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ⋯ , italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT }. For example, if a dataset contains two skin tones (i.e., g1=light_skinsubscript𝑔1𝑙𝑖𝑔𝑡_𝑠𝑘𝑖𝑛g_{1}=light\_skinitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_l italic_i italic_g italic_h italic_t _ italic_s italic_k italic_i italic_n and g2=dark_skinsubscript𝑔2𝑑𝑎𝑟𝑘_𝑠𝑘𝑖𝑛g_{2}=dark\_skinitalic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_d italic_a italic_r italic_k _ italic_s italic_k italic_i italic_n), the accuracy of model fNsuperscriptsubscript𝑓𝑁f_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT on group gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is denoted as A(fN,Dgi)𝐴superscriptsubscript𝑓𝑁subscript𝐷subscript𝑔𝑖A(f_{N}^{\prime},D_{g_{i}})italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

We define the “unfairness score” based on the overall accuracy and the group accuracy, denoted as U(fN,D)𝑈superscriptsubscript𝑓𝑁𝐷U(f_{N}^{\prime},D)italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ). Specifically, in this project, we calculate the unfairness score [19] U(fN,D)𝑈superscriptsubscript𝑓𝑁𝐷U(f_{N}^{\prime},D)italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) as the L1-norm:

U(fN,D)=giG{|A(fN,Dgi)A(fN,D)|}.𝑈superscriptsubscript𝑓𝑁𝐷subscriptfor-allsubscript𝑔𝑖𝐺𝐴superscriptsubscript𝑓𝑁subscript𝐷subscript𝑔𝑖𝐴superscriptsubscript𝑓𝑁𝐷\small U(f_{N}^{\prime},D)=\sum_{\forall g_{i}\in G}\{|A(f_{N}^{\prime},D_{g_{% i}})-A(f_{N}^{\prime},D)|\}.italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) = ∑ start_POSTSUBSCRIPT ∀ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G end_POSTSUBSCRIPT { | italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) | } . (1)

Results in Fig 1 (ii) illustrate that different architectures (N𝑁Nitalic_N) have different unfairness scores. We further investigate the influence of the training approach and data preprocessing. In Fig. 1 (ii), we modify the loss function in training to consider fairness in the training process, denoted as “Training Imp.”, and we conduct data balancing to increase the samples in minority groups aiming at improving fairness, denoted as “Data Imp.”. It is clear that both approaches can reduce the unfairness score. More interestingly, the three factors N𝑁Nitalic_N, fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and D𝐷Ditalic_D are coupled with each other, which indicates that optimizing them simultaneously is best to minimize the unfairness score.

3.2 BiaslessNAS Framework

Overview of BiaslessNAS framework: Fig. 2 shows the overview of BiaslessNAS, which is composed of 4 components: ➀ reinforcement learning (RL) optimizer, ➁ search space, ➂ fairness-aware trainer, and ➃ fairness and accuracy evaluator. Specifically, a recurrent neural network (RNN)-based controller guides the optimization process by sampling a batch generation method (BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M) and a neural architecture (a.k.a., child network) N𝑁Nitalic_N in the search space. Then, the fairness-aware trainer will tune the child network. Next, in the evaluator, the obtained model from the trainer will be evaluated to obtain accuracy and unfairness scores. With these metrics, a reward will be generated, which will be used to update RNN in the controller. We will introduce the details of these components in the following texts.

RL Optimizer: The controller iteratively predicts the hyperparameters of the batch generation method BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M and the child network N𝑁Nitalic_N. In each iteration, the controller receives a reward to update the RNN network. The reward R𝑅Ritalic_R is generated based on the outputs of the evaluator (see ➃), including accuracy A(fN,D)𝐴superscriptsubscript𝑓𝑁𝐷A(f_{N}^{\prime},D)italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ), and unfairness score U(fN,D)𝑈superscriptsubscript𝑓𝑁𝐷U(f_{N}^{\prime},D)italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ). R𝑅Ritalic_R is computed as follows.

R={αA(fN,D)βU(fN,D)A(fN,D)AC1otherwise𝑅cases𝛼𝐴superscriptsubscript𝑓𝑁𝐷𝛽𝑈superscriptsubscript𝑓𝑁𝐷𝐴superscriptsubscript𝑓𝑁𝐷𝐴𝐶missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression1𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression\small R=\left\{{\begin{array}[]{*{20}{c}}{\alpha\cdot A(f_{N}^{\prime},D)-% \beta\cdot U(f_{N}^{\prime},D)}&{\ \ \ A(f_{N}^{\prime},D)\geq AC}\\ {-1}&{otherwise}\end{array}}\right.italic_R = { start_ARRAY start_ROW start_CELL italic_α ⋅ italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) - italic_β ⋅ italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) end_CELL start_CELL italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) ≥ italic_A italic_C end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL italic_o italic_t italic_h italic_e italic_r italic_w italic_i italic_s italic_e end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW end_ARRAY (2)

where α𝛼\alphaitalic_α, β𝛽\betaitalic_β are two scaling factors that could be adjusted according to the specific demands on accuracy or fairness, and AC𝐴𝐶ACitalic_A italic_C is the requirement of the model accuracy on the full dataset D𝐷Ditalic_D.

Based on the reward, we employ reinforcement learning to update the controller. Specifically, we apply the Monte Carlo policy gradient algorithm [35]:

J(θ)=1mk=1mt=1TγTtθlogπθ(at|a(t1):1)(Rkb)𝐽𝜃1𝑚superscriptsubscript𝑘1𝑚superscriptsubscript𝑡1𝑇superscript𝛾𝑇𝑡subscript𝜃subscript𝜋𝜃conditionalsubscript𝑎𝑡subscript𝑎:𝑡11subscript𝑅𝑘𝑏\small\nabla J(\theta)=\frac{1}{m}\sum_{k=1}^{m}\sum_{t=1}^{T}\gamma^{T-t}% \nabla_{\theta}\log\pi_{\theta}(a_{t}|a_{(t-1):1})(R_{k}-b)∇ italic_J ( italic_θ ) = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_T - italic_t end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT roman_log italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT ( italic_t - 1 ) : 1 end_POSTSUBSCRIPT ) ( italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_b ) (3)

where m𝑚mitalic_m and T𝑇Titalic_T are the batch size and step in each episode. Rewards are discounted by an exponential factor γ𝛾\gammaitalic_γ, and b𝑏bitalic_b is the average exponential moving.

Refer to caption
Figure 2: Overview of BiaslessNAS : ➀ controller: generating a reward R𝑅Ritalic_R and updating the recurrent neural network (RNN)-based controller; ➁ search space: sampling a set of hyperparameters based on the updated controller to obtain the batch composition of groups’ data and a child network; ➂ fairness-aware trainer: on a validated dataset, training the identified child network on the generated batches; ➃ evaluator: generate the accuracy and unfairness score for the trained neural network fNsuperscriptsubscript𝑓𝑁f_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Data/Architecture Fusing Search Space: The search space is composed of two sets of hyperparameters: (1) hyperparameters for BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M, and (2) hyperparameters for child network architecture N𝑁Nitalic_N.

Batch Generation. The idea of creating BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M is to adjust the composition of data from different groups in one training data batch. We define oisubscript𝑜𝑖o_{i}italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be a ratio, indicating the percentage of images in one batch comes from sub-dataset Dgisubscript𝐷subscript𝑔𝑖D_{g_{i}}italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Let BS𝐵𝑆BSitalic_B italic_S be the batch size, then, we have oi×BSsubscript𝑜𝑖𝐵𝑆o_{i}\times BSitalic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_B italic_S to be the number of images from sub-dataset Dgisubscript𝐷subscript𝑔𝑖D_{g_{i}}italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and we have the constraint that giG{oi}=1subscriptfor-allsubscript𝑔𝑖𝐺subscript𝑜𝑖1\sum_{\forall g_{i}\in G}\{o_{i}\}=1∑ start_POSTSUBSCRIPT ∀ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G end_POSTSUBSCRIPT { italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = 1. To avoid accuracy degradation caused by oversampling of minority groups, we additionally have the following constraint: giG,gjGformulae-sequencefor-allsubscript𝑔𝑖𝐺subscript𝑔𝑗𝐺\forall g_{i}\in G,g_{j}\in G∀ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G , italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_G, if |Dgi||Dgj|subscript𝐷subscript𝑔𝑖subscript𝐷subscript𝑔𝑗|D_{g_{i}}|\leq|D_{g_{j}}|| italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ≤ | italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT |, then oiojsubscript𝑜𝑖subscript𝑜𝑗o_{i}\leq o_{j}italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where |Dgk|subscript𝐷subscript𝑔𝑘|D_{g_{k}}|| italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT | indicates the size of sub-dataset Dgksubscript𝐷subscript𝑔𝑘D_{g_{k}}italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Neural Architecture. We apply a linear array of a block as the backbone architecture. The design of basic blocks is inspired by the existing popular convolutional neural networks, including VGG-Net [31], MobileNet [13], and ResNet [12]. In this work, as shown in Fig. 2 ➁, we involve four types of basic blocks, including MobileNetV2-inspired ones (i.e., MB and DB), ResNet-inspired block (RB), and VGG-inspired block (CB). The basic blocks have four hyperparameters, including channel numbers (CH1𝐶𝐻1CH1italic_C italic_H 1, CH2𝐶𝐻2CH2italic_C italic_H 2, and CH3𝐶𝐻3CH3italic_C italic_H 3) and kernel sizes (K𝐾Kitalic_K). Kindly note that CH1𝐶𝐻1CH1italic_C italic_H 1 is not a searchable hyperparameter. Considering two adjacent blocks (AiAjsubscript𝐴𝑖subscript𝐴𝑗A_{i}\rightarrow A_{j}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT), CH1𝐶𝐻1CH1italic_C italic_H 1 in Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT has the same value as CH3𝐶𝐻3CH3italic_C italic_H 3 in Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Besides the four types of blocks, we also enable the block to be a skip operation, so that it has the flexibility in searching for the depth of the neural network.

Fairness-aware Trainer: Given an identified architecture (i.e., child network N𝑁Nitalic_N) and BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M, the fairness-aware trainer trains the child network to generate a trained model fNsuperscriptsubscript𝑓𝑁f_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Specifically, we first create batches of data using BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M on the validation dataset. Then, the model is trained using a fairness-aware loss function. Finally, after the iterative training process, we can obtain fNsuperscriptsubscript𝑓𝑁f_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Particularly, the fairness-aware loss function is formulated by leveraging the hyperparameters in BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M. Denote Bgisubscript𝐵subscript𝑔𝑖B_{g_{i}}italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT as the sub-batch of samples from sub-dataset Dgisubscript𝐷subscript𝑔𝑖D_{g_{i}}italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and we have |Bgi|=oi×|Dgi|subscript𝐵subscript𝑔𝑖subscript𝑜𝑖subscript𝐷subscript𝑔𝑖|B_{g_{i}}|=o_{i}\times|D_{g_{i}}|| italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | = italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × | italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT |, where |||*|| ∗ | is the size of a dataset/batch, and oisubscript𝑜𝑖o_{i}italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the ratio in BGM𝐵𝐺𝑀BGMitalic_B italic_G italic_M. For each sample sBgi𝑠subscript𝐵subscript𝑔𝑖s\in B_{g_{i}}italic_s ∈ italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, it has a target label Tssubscript𝑇𝑠T_{s}italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and a prediction results Pssubscript𝑃𝑠P_{s}italic_P start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. After the forward propagation, we apply Cross Entropy to compute the loss, as follows,

L=giGsBgi{argmaxgjGojoiTslogPs},𝐿subscriptsubscript𝑔𝑖𝐺subscript𝑠subscript𝐵subscript𝑔𝑖subscriptsubscript𝑔𝑗𝐺subscript𝑜𝑗subscript𝑜𝑖subscript𝑇𝑠subscript𝑃𝑠\small L=-\sum_{g_{i}\in G}\sum_{s\in B_{g_{i}}}\{\frac{\arg\max_{g_{j}\in G}{% o_{j}}}{o_{i}}\cdot T_{s}\log P_{s}\},italic_L = - ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_G end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_s ∈ italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT { divide start_ARG roman_arg roman_max start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_G end_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋅ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT roman_log italic_P start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } , (4)

where argmaxgjGojsubscriptsubscript𝑔𝑗𝐺subscript𝑜𝑗\arg\max_{g_{j}\in G}{o_{j}}roman_arg roman_max start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_G end_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT identifies the ratio of the largest group to compose a batch, which is used to form a scaling factor. The final generated fair loss is also used to complete the backward propagation.

Fairness and Accuracy Evaluator: With the trained model fNsuperscriptsubscript𝑓𝑁f_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the accuracy can be obtained. Meanwhile, the unfairness score can be calculated based on the validate dataset D𝐷Ditalic_D with Eq. 1. The obtained A(fN,D)𝐴superscriptsubscript𝑓𝑁𝐷A(f_{N}^{\prime},D)italic_A ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) and U(fN,D)𝑈superscriptsubscript𝑓𝑁𝐷U(f_{N}^{\prime},D)italic_U ( italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_D ) will be utilized to calculate the reward in ➀ RL Optimizer.

4 Experiment

Dataset and settings We use the Fair and Intelligent Embedded System Challenge (ESFair) dataset [3], which is composed of data from ISIC2019, Dermnet[2], and Atlas[1]. Thera are 5 dermatology diseases for classification. We compare solutions obtained by BiaslessNAS with a set of existing neural architectures, including MobileNetV2 [27], ResNet [33], and MnasNet [tan2019mnasnet]. All models are trained from scratch with the same hyperparameters on a GPU cluster with 48 RTX 3080. The learning rate starts from 0.01 with a decay of 0.9 in 20 steps; while the batch size is 32 with 500 epochs.

Table 1: Accuracy (mean±standard deviation) comparisons between the existing neural architectures and BiaslessNAS using the Top-5 models trained by each neural architecture, in terms of highest reward in Eq. 2
Model Light Acc.(%) Dark Acc.(%) Overall(%) Acc Imp.
Unfair.
Score
Fair. Imp.
MobilenetV2 81.90±plus-or-minus\pm±0.78 59.26±plus-or-minus\pm±1.2 81.69±plus-or-minus\pm±0.77 baseline
0.2264
±plus-or-minus\pm±0.0194
baseline
Resnet18 82.54±plus-or-minus\pm±1.48 63.59±plus-or-minus\pm±1.14 82.36±plus-or-minus\pm±1.47 0.67% \uparrow
0.1894
±plus-or-minus\pm±0.0233
16.34% \uparrow
ResNet34 82.95±plus-or-minus\pm±0.69 67.18±plus-or-minus\pm±1.14 82.81±plus-or-minus\pm±0.67 1.12% \uparrow
0.1577
±plus-or-minus\pm±0.0181
30.34% \uparrow
MnasNet 76.54±plus-or-minus\pm±1.20 61.02±plus-or-minus\pm±3.34 76.40±plus-or-minus\pm±1.22 5.29%\downarrow
0.1551
±plus-or-minus\pm±0.0253
31.49% \uparrow
Biasless
NAS-Fair
79.58±plus-or-minus\pm±0.18 71.79±plus-or-minus\pm±2.57 79.51±plus-or-minus\pm±0.20 2.18%\downarrow
0.0779
±plus-or-minus\pm±0.0252
65.59%\uparrow
Biasless
NAS-Acc
84.37±plus-or-minus\pm±0.53 69.23±plus-or-minus\pm±1.81 84.24±plus-or-minus\pm±0.52 2.55%\uparrow
0.1514
±plus-or-minus\pm±0.0226
33.13%\uparrow

Evaluation of BiaslessNAS. Table 1 reports the evaluation results. These two architectures were obtained from BiaslessNAS with the lowest unfairness score and the highest accuracy, respectively. Two hyperparameters are used in the framework: (1) Alpha is the scalable parameter for accuracy, and (2) Beta is for fairness. We explore two settings: BiaslessNAS-Fair has a larger Beta (0.8) and a smaller Alpha (0.2), while BiaslessNAS-Acc has a larger Alpha (0.8) and a smaller Beta (0.2). For a fair comparison of different neural architectures (N𝑁Nitalic_N), all competitors are trained using the proposed fairness-aware data processing (D𝐷Ditalic_D) and trainer (fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT). As shown in Table 1, it is clear that BiaslessNAS-Fair can achieve competitive accuracy with the lowest unfairness score over others. More specifically, the unfairness score of BiaslessNAS-Fair is only 0.0779 on average, which achieves an improvement of 65.59% compared with MobileNetV2 regarding fairness. On the other hand, BiaslessNAS-Acc achieves the highest accuracy with the lowest unfairness score against other existing models.

Refer to caption
Figure 3: Visualization of BiaslessNAS-Fair and BiaslessNAS-Acc, together with their performance on different fairness metrics

Neural Architecture Visualization. Fig. 3(a)-(b) showcase the neural architectures derived from BiaslessNAS, highlighting the structural nuances between BiaslessNAS-Fair and BiaslessNAS-Acc. Despite sharing identical block types across layers, these architectures differ in the number of channels employed. Notably, both incorporate a MobileNet block at the outset for initial feature processing, followed by denser conventional and Residual blocks tailored to manage diverse group features. This visualization underscores the impact of neural architecture on fairness and suggests that strategically varying block types, particularly at the beginning and end of the architecture, can synergistically enhance fairness outcomes. This observation supports the premise that thoughtful architectural design is crucial in developing fair and effective architectures.

BiaslessNAS is Fairer on Different Metrics. In addition to the unfairness score defined in Equation 1, we further evaluate BiaslessNAS on other two commonly used fairness metrics: Disparate impact (DI) [10] and Statistical Parity Difference (SPD) [19]. Fig. 3(c)-(d) present a comparison. In Fig. 3(c), BiaslessNAS-Fair stands out by achieving the highest DI value, indicating its superiority in fairness over other examined architectures. Fig. 3(d) reveals that models with SPD scores closer to zero are preferable, with BiaslessNAS-Fair and BiaslessNAS-Acc emerging as the top performers in this regard. These findings collectively demonstrate that BiaslessNAS effectively identifies solutions that surpass conventional neural architectures in fairness across different metrics.

Refer to caption
Figure 4: Evaluation of fairness-aware trainer on the existing neural architectures

Evaluation of fairness-aware trainer: This ablation study is conducted by fixing the same N&D𝑁𝐷N\&Ditalic_N & italic_D and comparing results for different fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Fig. 4 shows the evaluation results of the fairness-aware trainer on 4 existing neural architectures. The baseline for each architecture has the setting of olod=|Dgl||Dgd|subscript𝑜𝑙subscript𝑜𝑑subscript𝐷subscript𝑔𝑙subscript𝐷subscript𝑔𝑑\frac{o_{l}}{o_{d}}=\frac{|D_{g_{l}}|}{|D_{g_{d}}|}divide start_ARG italic_o start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG = divide start_ARG | italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG start_ARG | italic_D start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG, which means that the batch generator will use the same ratio between the number of dark-skin and light-skin images to load data. On the other hand, the fairness-aware trainer (denoted as FAT𝐹𝐴𝑇FATitalic_F italic_A italic_T) changes the ratio of olodsubscript𝑜𝑙subscript𝑜𝑑\frac{o_{l}}{o_{d}}divide start_ARG italic_o start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG to be 1.

In these figures, each dot is associated with one solution: the dots with a cross represent the baseline approach and the dots represent the FAT approach. From the results in Fig. 4, we have several observations. (1) FAT can find neural architectures with lower unfairness scores. (2) But, if the design is to maximize accuracy regardless of the fairness, then the baseline performs better than FAT (note that one exception is MobileNetV2, in which FAT dominates the baseline approach). More specifically, when we compare the fairest architectures (i.e., the left-most dots for each approach in Fig. 4), FAT can achieve a 10.52%, 50.20%, 36.98%, and 37.82% reduction in unfairness scores on each architecture. The above results clearly show that with the same neural architecture and data augmentation, the fairness-aware trainer can indeed improve fairness but it should be careful about the possible accuracy degradation.

Table 2: Quantitative Analysis of Three Fairness-related Factors on MobileNetV2
Models Acc. Unfairness DI Ranking
MobilenetV2 (Vanilla)
81.05% 0.2325 0.71 5
MobilenetV2 with fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
81.34% 0.2105 0.74 4
MobilenetV2 with (D𝐷Ditalic_D + fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT)
82.14% 0.1528 0.81 2
FairNAS with N𝑁Nitalic_N
[30]
84.06% 0.1755 0.79 3
BiaslessNAS-Acc with (D𝐷Ditalic_D + fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + N𝑁Nitalic_N)
84.24% 0.1514 0.82 1

Evaluation of different optimization combinations. This ablation study evaluates various optimization combinations to assess the benefits of co-optimize. The results, summarized in Table 2, contrast different strategies against a baseline MobileNetV2 architecture. Initially, we examine MobileNetV2 in its standard form, followed by versions enhanced with a fairness-aware trainer (denoted as fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) and then with both a co-optimized trainer and data augmentation (D+f𝐷superscript𝑓D+f^{\prime}italic_D + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT). The outcomes illustrate that co-optimization significantly enhances the fairness of MobileNetV2, as indicated by improvements in unfairness scores and disparate impact metrics. In a further analysis, a fairness-aware Neural Architecture Search (NAS), termed ”FairNAS,” is introduced. FairNAS seeks to identify fair neural architectures without incorporating a fairness-aware trainer or data augmentation. Interestingly, FairNAS surpasses the fairness metrics of MobileNetV2 paired with fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT alone but falls short of the combination of MobileNetV2 with D+f𝐷superscript𝑓D+f^{\prime}italic_D + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in fairness metrics, albeit with a slight advantage in accuracy. Introducing BiaslessNAS-Acc, which integrates data-algorithm-architecture (D+f+N𝐷superscript𝑓𝑁D+f^{\prime}+Nitalic_D + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_N) reveals that this approach outperforms FairNAS by achieving higher accuracy and further enhancing fairness. This comprehensive co-optimization of data, algorithm, and architecture emerges as the most effective strategy, showcasing the superior efficacy of simultaneous optimization across these dimensions for advancing both accuracy and fairness in machine learning models.

The above results give us the following three insights. (1) Neural architecture indeed affects fairness. It can even make a larger impact on fairness than the fairness-aware trainer. (2) The neural architecture search is good at identifying architectures with high accuracy. But without the help of a fairness-aware trainer and data augmentation, it may not optimize the fairness in the search loop. (3) Co-optimization is essential to make the best accuracy-fairness tradeoff.

5 Conclusion

In this paper, we delve into the factors influencing fairness in ML systems, unveiling that optimizing models, algorithms, and data collectively can better balance accuracy and fairness. We introduce a novel framework, BiaslessNAS, designed for this holistic optimization approach, specifically targeting the inherent biases in skin lesion datasets. To ensure accuracy and fairness, BiaslessNAS incorporates a fairness-aware training mechanism that creates balanced data batches and refines weighted loss to enhance the fairness of minority groups. Additionally, a reinforcement learning optimizer steers the co-optimization process, proving that this integrated approach markedly surpasses traditional methods that optimize data, algorithms, and architecture separately. Our evaluations confirm that co-optimization significantly enhances fairness without compromising accuracy.

5.0.1 Acknowledgements

We gratefully acknowledge the support of the National Institutes of Health (NIH) (Award No. 1R01EB033387-01).

5.0.2 \discintname

The authors have no competing interests to declare that are relevant to the content of this article.

References

  • [1] Dermatology atlas. http://www.atlasdermatologico.com.br/, accessed Nov, 2021
  • [2] Dermnet dataset. http://www.dermnet.com/, accessed Nov, 2021
  • [3] Fair and intelligent embedded system challenge at esweek 2023. https://esfair2023.github.io/ESFair/Submission.html
  • [4] Gender and skin-type bias in commercial ai systems. https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212
  • [5] Skin lesion analysis. https://challenge2019.isic-archive.com/
  • [6] Abusitta, A., Aïmeur, E., Wahab, O.A.: Generative adversarial networks for mitigating biases in machine learning systems. arXiv preprint arXiv:1905.09972 (2019)
  • [7] Bahng, H., Chun, S., Yun, S., Choo, J., Oh, S.J.: Learning de-biased representations with biased representations. In: International Conference on Machine Learning. pp. 528–539. PMLR (2020)
  • [8] Chiu, C.H., Chung, H.W., Chen, Y.J., Shi, Y., Ho, T.Y.: Toward fairness through fair multi-exit framework for dermatological disease diagnosis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 97–107. Springer (2023)
  • [9] De, A., Sarda, A., Gupta, S., Das, S.: Use of artificial intelligence in dermatology. Indian journal of dermatology 65(5),  352 (2020)
  • [10] Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. pp. 259–268 (2015)
  • [11] Hao, W., El-Khamy, M., Lee, J., Zhang, J., Liang, K.J., Chen, C., Duke, L.C.: Towards fair federated learning with zero-shot data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3310–3319 (2021)
  • [12] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
  • [13] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  • [14] Jiang, W., Yang, L., Sha, E.H.M., Zhuge, Q., Gu, S., Dasgupta, S., Shi, Y., Hu, J.: Hardware/software co-exploration of neural architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(12), 4805–4815 (2020)
  • [15] Jiang, W., Zhang, X., Sha, E.H.M., Yang, L., Zhuge, Q., Shi, Y., Hu, J.: Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In: Proceedings of the 56th Annual Design Automation Conference 2019. pp. 1–6 (2019)
  • [16] Jin, C., Che, T., Peng, H., Li, Y., Pavone, M.: Learning from teaching regularization: Generalizable correlations should be easy to imitate. arXiv preprint arXiv:2402.02769 (2024)
  • [17] Kamulegeya, L.H., Okello, M., Bwanika, J.M., Musinguzi, D., Lubega, W., Rusoke, D., Nassiwa, F., Börve, A.: Using artificial intelligence on dermatology conditions in uganda: A case for diversity in training data sets for machine learning. BioRxiv p. 826057 (2019)
  • [18] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)
  • [19] Li, X., Cui, Z., Wu, Y., Gu, L., Harada, T.: Estimating and improving fairness with adversarial learning. arXiv preprint arXiv:2103.04243 (2021)
  • [20] Miranda, T.C., Gimenez, P.F., Lalande, J.F., Tong, V.V.T., Wilke, P.: Debiasing android malware datasets: How can i trust your results if your dataset is biased? IEEE Transactions on Information Forensics and Security 17, 2182–2197 (2022)
  • [21] Nakajima, S., Chen, T.Y.: Generating biased dataset for metamorphic testing of machine learning programs. In: IFIP International Conference on Testing Software and Systems. pp. 56–64. Springer (2019)
  • [22] Nam, J., Cha, H., Ahn, S., Lee, J., Shin, J.: Learning from failure: De-biasing classifier from biased classifier. Advances in Neural Information Processing Systems 33, 20673–20684 (2020)
  • [23] Ouyang, N., Huang, Q., Li, P., Cai, Y., Liu, B., Leung, H.f., Li, Q.: Suppressing biased samples for robust vqa. IEEE Transactions on Multimedia 24, 3405–3415 (2022). https://doi.org/10.1109/TMM.2021.3097502
  • [24] Peng, H., Huang, S., Zhou, T., Luo, Y., Wang, C., Wang, Z., Zhao, J., Xie, X., Li, A., Geng, T., et al.: Autorep: Automatic relu replacement for fast private network inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5178–5188 (2023)
  • [25] Peng, H., Ran, R., Luo, Y., Zhao, J., Huang, S., Thorat, K., Geng, T., Wang, C., Xu, X., Wen, W., et al.: Lingcn: Structural linearized graph convolutional network for homomorphically encrypted inference. Advances in Neural Information Processing Systems 36 (2024)
  • [26] Peng, H., Xie, X., Shivdikar, K., Hasan, M.A., Zhao, J., Huang, S., Khan, O., Kaeli, D., Ding, C.: Maxk-gnn: Extremely fast gpu kernel design for accelerating graph neural networks training. In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. pp. 683–698 (2024)
  • [27] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proc. of CVPR. pp. 4510–4520 (2018)
  • [28] Shafahi, A., Najibi, M., Ghiasi, M.A., Xu, Z., Dickerson, J., Studer, C., Davis, L.S., Taylor, G., Goldstein, T.: Adversarial training for free! Advances in Neural Information Processing Systems 32 (2019)
  • [29] Sharma, S., Zhang, Y., Ríos Aliaga, J.M., Bouneffouf, D., Muthusamy, V., Varshney, K.R.: Data augmentation for discrimination prevention and bias disambiguation. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. pp. 358–364 (2020)
  • [30] Sheng, Y., Yang, J., Wu, Y., Mao, K., Shi, Y., Hu, J., Jiang, W., Yang, L.: The larger the fairer? small neural networks can achieve fairness for edge devices. In: Proceedings of the 59th ACM/IEEE Design Automation Conference. pp. 163–168 (2022)
  • [31] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  • [32] Spinde, T., Krieger, D., Plank, M., Gipp, B.: Towards a reliable ground-truth for biased language detection. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL). pp. 324–325. IEEE (2021)
  • [33] Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: Generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
  • [34] Wang, T., Xu, X., Xiong, J., Jia, Q., Yuan, H., Huang, M., Zhuang, J., Shi, Y.: Ica-unet: Ica inspired statistical unet for real-time 3d cardiac cine mri segmentation. In: International conference on medical image computing and computer-assisted intervention. pp. 447–457. Springer (2020)
  • [35] Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4), 229–256 (1992)
  • [36] Zheng, H., Han, J., Wang, H., Yang, L., Zhao, Z., Wang, C., Chen, D.Z.: Hierarchical self-supervised learning for medical image segmentation based on multi-domain data aggregation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 622–632. Springer (2021)