Evolutionary Multitasking Neural Networks: Accelerating Drug Discovery Through Parallel Optimization

Madelyn Parker Nov 26, 2025 184

This article explores the emerging paradigm of evolutionary multitasking (EMT) for training neural networks, with a specialized focus on applications in drug discovery and development. It establishes the foundational principles of EMT, which enables the simultaneous optimization of multiple related tasks by leveraging synergistic knowledge transfer. The content details cutting-edge methodological frameworks and their practical implementation for challenges such as drug-target interaction prediction and feature selection in high-dimensional bioinformatics data. It further provides crucial insights for troubleshooting common optimization pitfalls and presents a rigorous validation framework based on benchmarking standards from the CEC 2025 competition. Aimed at researchers and drug development professionals, this comprehensive review synthesizes theoretical advances with practical applications, outlining how EMT can significantly reduce computational costs and accelerate the identification of novel therapeutic candidates.

Evolutionary Multitasking Neural Networks: Accelerating Drug Discovery Through Parallel Optimization

Abstract

This article explores the emerging paradigm of evolutionary multitasking (EMT) for training neural networks, with a specialized focus on applications in drug discovery and development. It establishes the foundational principles of EMT, which enables the simultaneous optimization of multiple related tasks by leveraging synergistic knowledge transfer. The content details cutting-edge methodological frameworks and their practical implementation for challenges such as drug-target interaction prediction and feature selection in high-dimensional bioinformatics data. It further provides crucial insights for troubleshooting common optimization pitfalls and presents a rigorous validation framework based on benchmarking standards from the CEC 2025 competition. Aimed at researchers and drug development professionals, this comprehensive review synthesizes theoretical advances with practical applications, outlining how EMT can significantly reduce computational costs and accelerate the identification of novel therapeutic candidates.

The Foundations of Evolutionary Multitasking: From Biological Inspiration to Computational Power

Evolutionary Multitasking (EMT) represents a paradigm shift in evolutionary computation, enabling the simultaneous optimization of multiple tasks by exploiting their underlying synergies. Unlike traditional isolated approaches that solve problems independently, EMT fosters implicit knowledge transfer between tasks, often leading to accelerated convergence, improved solution quality, and more efficient resource utilization. This protocol outlines the core principles, methodologies, and applications of EMT, with a special focus on its transformative potential in training neural networks and its implications for complex research domains such as drug development.

Core Principles and Definitions

Evolutionary Multitasking optimization (EMTO) moves beyond the conventional single-task focus of evolutionary algorithms by formulating an environment where K distinct optimization tasks are solved concurrently [1] [2]. The fundamental goal is to find a set of optimal solutions {x1, ..., xK} where each x*i is the best solution for its respective task, by leveraging potential complementarities between the tasks [2].

The Multifactorial Evolutionary Algorithm (MFEA), a pioneering EMT algorithm, introduces several key concepts for comparing individuals in a multitasking environment [1]:

Factorial Cost: The performance of an individual on a specific task, incorporating objective value and constraint violation.
Skill Factor: The task on which an individual performs best.
Scalar Fitness: A unified measure of an individual's overall performance across all tasks, derived from its factorial ranks.

Knowledge transfer in EMT is primarily realized through assortative mating and vertical cultural transmission [1]. When two parent individuals with different skill factors reproduce, genetic material is exchanged, allowing for the implicit transfer of beneficial traits across tasks. This process is often governed by a random mating probability (rmp) parameter, which controls the frequency of inter-task crossover [3].

Application Notes: EMT in Neural Network Training and Research

The principles of EMT are particularly well-suited for the complex, multi-faceted challenges of artificial neural network (ANN) design and training. The traditional approach of sequentially optimizing architecture and parameters can be suboptimal and prone to catastrophic forgetting when a network is required to perform multiple tasks [4]. EMT offers a unified framework to address these issues.

Table 1: Evolutionary Multitasking Applications in Neural Network Research

Application Domain	EMT Approach	Key Benefit	Citation
Bi-Level Neural Architecture Search	Upper level minimizes network complexity; lower level optimizes training parameters to minimize loss.	Discovers compact, efficient architectures without compromising predictive performance.	[5]
Developmental Neural Networks	Uses Cartesian Genetic Programming to evolve developmental programs that build ANNs capable of multiple tasks.	Mitigates catastrophic forgetting; incorporates Activity Dependence for self-regulation.	[4]
Hybrid BCI Channel Selection	Formulates channel selection for Motor Imagery and SSVEP tasks as a multi-objective problem solved simultaneously.	Balances channel count and classification accuracy for multiple signal types efficiently.	[6]
Color Categorization Research	Probes a CNN trained for object recognition with an evolutionary algorithm to find invariant color category boundaries.	Provides evidence that color categories can emerge as a byproduct of learning visual skills.	[7] [8]

Key Signaling Pathway: Two-Level Transfer Learning

A significant advancement in EMT is the Two-Level Transfer Learning (TLTL) algorithm, which enhances the basic MFEA by structuring knowledge transfer more efficiently [1].

Diagram 1: Two-Level Transfer Learning Workflow

The Upper Level (Inter-Task Transfer) focuses on transferring knowledge between different optimization tasks. It moves beyond simple random crossover by incorporating elite individual learning, thereby reducing randomness and enhancing search efficiency. This level exploits inter-task commonalities and similarities [1].

The Lower Level (Intra-Task Transfer) operates within a single task, transmitting information from one dimension to other dimensions. This is particularly crucial for across-dimension optimization, helping to accelerate convergence within a complex task's own search space [1].

Experimental Protocols

This section provides a detailed, reproducible methodology for implementing and evaluating an Evolutionary Multitasking algorithm, using the foundational MFEA and a competitive multitasking variant as examples.

Protocol 1: Base Multifactorial Evolutionary Algorithm (MFEA)

Objective: To simultaneously solve K single-objective optimization tasks using implicit genetic transfer.

Materials and Reagents:

Software: A programming environment with computational capabilities (e.g., Python, MATLAB).
Data: Definition of the K optimization tasks, including their search spaces Î©k and objective functions Fk.

Procedure:

Initialization:
- Generate a unified population P of N individuals.
- Randomly assign a skill factor (dominant task) to each individual.
- Evaluate each individual only on its skill factor task to conserve computational resources.

Evolutionary Cycle (Repeat for G generations): a. Assortative Mating: * Randomly select two parent candidates, pa and pb, from the population. * If pa and pb have the same skill factor OR a random number is less than the rmp parameter, perform crossover and mutation to generate offspring ca and cb. * If the skill factors are different, randomly assign the offspring to imitate the skill factor of one of the parents. * If the above condition is false, generate offspring by applying mutation directly to each parent. b. Evaluation: Evaluate each offspring individual only on its assigned skill factor task. c. Selection: Select the fittest individuals from the combined pool of parents and offspring to form the population for the next generation, based on scalar fitness.
Output:
- Upon completion, the population contains high-quality solutions for each of the K tasks. The best individual for a task is identified by its factorial cost on that task.

Protocol 2: Competitive Multitasking for Endmember Extraction (CMTEE)

Objective: To solve a group of related but competitive tasksâ€”in this case, endmember extraction from hyperspectral images with varying numbers of endmembersâ€”using online resource allocation [9].

Materials and Reagents:

Data: A hyperspectral image cube.
Model: A linear spectral mixture model (LSMM) to represent the data.
Software: An optimization environment capable of implementing evolutionary algorithms and linear algebra operations.

Procedure:

Problem Formulation:
- Define a set of optimization tasks {T1, T2, ..., TK}, where each task Tk represents the endmember extraction problem for a specific number of endmembers, k.
- The objective function for each task is typically based on reconstruction error.

Algorithm Execution:
- These tasks are considered competitive as they vie for the best representation of the same underlying data.
- Implement a multitasking evolutionary framework where a single population evolves solutions for all tasks simultaneously.
- Employ an online resource allocation strategy. This strategy dynamically monitors the performance (e.g., improvement rate) of each task and assigns more computational resources (e.g., more fitness evaluations) to tasks that are showing promise, and fewer to those that are stagnating.
Output:
- A set of Pareto-optimal solutions that provide a trade-off between the number of endmembers and the reconstruction accuracy for the hyperspectral image.

Table 2: Quantitative Results from EMT Applications

Algorithm / Study	Metric 1	Performance	Metric 2	Performance	Baseline Comparison
EB-LNAST (Bi-Level NAS)	Predictive Accuracy	Competitive (â‰¤0.99% reduction)	Model Size	99.66% reduction	vs. Tuned MLPs	[5]
BOMTEA (Adaptive Bi-Operator)	Overall Performance on CEC17/CEC22	Significantly outperformed comparative algorithms	Adaptive ESO Selection	Effective for CIHS, CIMS, CILS problems	vs. MFEA, MFDE	[3]
CMTEE (Hyperspectral Extraction)	Convergence Speed	Accelerated	Extraction Accuracy	Improved	vs. Single-task runs	[9]
TLTL Algorithm	Convergence Rate	Fast	Global Search Ability	Outstanding	vs. State-of-the-art EMT	[1]

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Components for Evolutionary Multitasking Experiments

Research Reagent	Function / Definition	Example Use-Case
Random Mating Probability (rmp)	A control parameter that determines the likelihood of crossover between individuals from different tasks.	In MFEA, a high rmp promotes knowledge transfer, while a low rmp encourages independent task evolution.	[1] [3]
Skill Factor (Ï„)	The one task, among all concurrent tasks, on which an individual in the population performs the best.	Used in scalar fitness calculation and to determine which task an offspring should be evaluated on.	[1]
Evolutionary Search Operator (ESO)	The algorithm (e.g., GA, DE, SBX) used to generate new candidate solutions from existing ones.	BOMTEA adaptively selects between GA and DE operators based on their performance on different tasks.	[3]
Scalar Fitness (Ï†)	A unified measure of an individual's performance across all tasks, allowing for cross-task comparison and selection.	Calculated as 1 / (factorial rank), enabling the selection of elites from a multi-task population.	[1]
Activity Dependence (AD)	A mechanism that allows a developed neural network to adjust internal parameters (e.g., bias, health) based on task performance feedback.	Enhances the learning and adaptability of evolved developmental neural networks for multitasking.	[4]
Online Resource Allocation	A dynamic strategy that assigns varying amounts of computational resources to different tasks based on their real-time performance.	Used in competitive multitasking (CMTEE) to focus resources on the most promising search trajectories.	[9]
DNA-PK-IN-12	DNA-PK-IN-12, MF:C21H24N8O2, MW:420.5 g/mol	Chemical Reagent
Dnp-PLGLWAr-NH2	Dnp-PLGLWAr-NH2, MF:C45H64N14O11, MW:977.1 g/mol	Chemical Reagent

Visualization: Evolutionary Competitive Multitasking

The following diagram illustrates the competitive multitasking paradigm used in applications like CMTEE, where tasks compete for computational resources.

Diagram 2: Competitive Multitasking with Resource Allocation

Evolutionary Multitask Optimization (EMTO) is a computational paradigm that mirrors a fundamental principle of natural evolution: the concurrent solution of multiple challenges. In nature, biological systems do not optimize for a single, isolated function but rather navigate a complex landscape of simultaneous pressures, including predator avoidance, resource acquisition, and mate selection. This process results in robust and adaptable organisms. Similarly, EMTO posits that similar or related optimization tasks can be solved more efficiently by leveraging knowledge gained from solving one task to accelerate the solution of others, rather than addressing each task in isolation [10]. This approach has demonstrated powerful scalability and search capabilities, finding application in diverse areas such as multi-objective optimization, combinatorial problems, and expensive optimization problems [10].

Within the specific context of neural network training, evolutionary algorithms (EAs) offer a compelling, gradient-free alternative to traditional backpropagation. Training biophysical neuron models provides significant insights into brain circuit organization and problem-solving capabilities. However, backpropagation often faces challenges like instability and gradient-related issues when applied to complex models. Evolutionary models, particularly when combined with mechanisms like heterosynaptic plasticity, present a robust alternative that can recapitulate brain-like dynamics during cognitive tasks [11]. This biological analogy extends beyond mere inspiration, offering tangible benefits in training versatile networks that achieve performance comparable to gradient-based methods on tasks ranging from MNIST classification to Atari games [11].

Theoretical Foundations and Biological Mechanisms

The operational principles of Evolutionary Multitasking are deeply rooted in metaphors of biological evolution. The population of candidate solutions undergoes a process of variation, selection, and reproduction, implicitly exchanging genetic material (knowledge) across tasks.

Core Biological Analogies

Population-based Search: Unlike traditional point-based algorithms, EMTO maintains a diverse population of individuals, each representing a potential solution. This diversity is crucial for exploring disparate regions of the search space concurrently, mirroring the genetic diversity within a species that enables adaptation to changing environments.
Knowledge as Genetic Material: In EMTO, the "knowledge" transferred between tasks is encoded within the genotypes of individuals. This is analogous to beneficial genetic traits that, once evolved in one context, can provide adaptive advantages in another, a phenomenon observed in horizontal gene transfer or shared ancestral traits.
Heterosynaptic Plasticity: Drawing from neuroscience, heterosynaptic plasticity is a biological mechanism where the modification of one synapse influences the strength of neighboring synapses. When integrated into evolutionary models, it aids network training by introducing a local, cooperative dynamic that stabilizes learning and prevents overspecialization, much like dendritic spine meta-plasticity in biological brains [11].

The Evolutionary Multitasking Framework

Formally, Evolutionary Multitask Optimization addresses Multiple Task Optimization Problems (MTOPs). The fundamental assumption is the existence of transferable knowledge across distinct optimization tasks. Through algorithmic operations that mimic crossover and mutation, knowledge is transferred, allowing the algorithm to use lessons learned in one task to speed up the solution of others [10]. The efficacy of this knowledge transfer hinges on three critical algorithmic components, which are active areas of research:

Knowledge Transfer Probability: Determining how often information should be exchanged between tasks.
Transfer Source Selection: Identifying which tasks are sufficiently "similar" to benefit from knowledge exchange.
Knowledge Transfer Mechanism: Defining the form in which knowledge is transferred (e.g., direct transfer of elite individuals, or mapping and transfer of population distribution) [10].

Experimental Protocols in Evolutionary Multitasking

To empirically validate the performance of evolutionary multitasking algorithms, rigorous experimental protocols are employed. The following section details the methodology for a benchmark experiment and a real-world application.

Protocol 1: Benchmarking the MGAD Algorithm

This protocol outlines the steps for evaluating a novel adaptive evolutionary multitask optimization algorithm, MGAD, against established benchmarks [10].

Objective: To assess the convergence speed and optimization ability of the MGAD algorithm on standardized multitask optimization problems.
Materials and Setup:
- Algorithms: The MGAD algorithm is compared against other state-of-the-art EMTO algorithms such as MFEA, MFEA-II, and EEMTA.
- Benchmark Problems: A suite of four established comparative benchmark problem sets for multitask optimization is used.
- Performance Metrics: Key metrics include convergence curves (to visualize speed), the final best objective function value achieved (to measure accuracy), and statistical tests (e.g., Wilcoxon signed-rank test) to confirm significance.
Procedure:
- Algorithm Configuration: Implement the MGAD algorithm with its core components: an enhanced adaptive knowledge transfer probability strategy, a source task selection mechanism using Maximum Mean Difference (MMD) and Grey Relational Analysis (GRA), and an anomaly detection-based knowledge transfer strategy.
- Control Group Setup: Implement the comparison algorithms according to their published specifications.
- Experimental Run: For each benchmark problem set, execute all algorithms, ensuring an equal number of function evaluations for a fair comparison.
- Data Collection: Record the performance metrics for each algorithm run across multiple independent trials to account for stochasticity.
- Validation: Conduct a real-world validation experiment, such as applying the algorithms to a planar robotic arm control problem, to demonstrate practical utility.
Analysis: The results are analyzed to determine if MGAD exhibits statistically stronger competitiveness in convergence speed and optimization ability compared to the other algorithms.

Protocol 2: Evolutionary Bi-Level Neural Architecture Search

This protocol describes the application of a bi-level evolutionary approach to optimize neural networks for a specific task, such as color classification [5].

Objective: To simultaneously optimize the architecture, weights, and biases of a neural network using a bi-level optimization strategy, minimizing network complexity while maximizing predictive performance.
Materials and Setup:
- Dataset: A real-world dataset, such as a color classification dataset or the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.
- Baseline Models: Traditional machine learning algorithms (e.g., SVM, Random Forest) and advanced models like Multilayer Perceptrons (MLPs) with extensive hyperparameter tuning.
- Evaluation Metrics: Predictive accuracy, model size (number of parameters), and computational cost during training.
Procedure:
- Define the Bi-Level Framework:
  - Upper-Level Optimizer: An evolutionary algorithm tasked with minimizing network complexity (e.g., number of neurons, connections), which is penalized by the lower-level's performance.
  - Lower-Level Optimizer: A training process (e.g., based on gradient descent or a simpler EA) that, for a given architecture from the upper level, minimizes the loss function (e.g., cross-entropy) to maximize predictive performance.
- Evolutionary Search: The upper-level EA generates populations of neural network architectures. For each architecture, the lower-level optimizer performs training, and the resulting performance is fed back to the upper level to guide selection, crossover, and mutation.
- Evaluation: The best-performing architecture discovered by the search process is evaluated on a held-out test set.
Analysis: Compare the predictive performance and model size of the evolved network against the baseline models. The success of the EB-LNAST approach is demonstrated by achieving superior or competitive predictive performance while reducing model size by up to 99.66% compared to traditional MLPs [5].

Performance Data and Comparative Analysis

The following tables summarize quantitative results from key experiments in evolutionary multitasking and neuroevolution, demonstrating the efficacy of the biological analogy.

Table 1: Performance Comparison of Evolutionary Multitasking Algorithms on Benchmark Problems [10]

Algorithm	Key Mechanism	Convergence Speed	Final Solution Quality	Remarks
MGAD	Anomaly detection transfer, MMD/GRA similarity	Fastest	Highest	Strong competitiveness; reduces negative transfer
MFEA-II	Dynamically adjusted RMP matrix	Moderate	High	Improves over MFEA with feedback
MFEA	Fixed knowledge transfer probability	Slower	Good	Foundational algorithm but limited adaptability
EEMTA	Feedback-based credit assignment	Moderate	Good	Explicit task selection

Table 2: Performance of Evolutionary Bi-Level Neural Architecture Search (EB-LNAST) on Color Classification [5]

Model / Approach	Predictive Performance (Accuracy)	Model Size (Parameters)	Reduction in Model Size vs. MLP
EB-LNAST (Proposed)	Statistically significant improvements	Optimized & Compact	Up to 99.66%
Traditional ML (e.g., SVM, RF)	Lower	N/A	N/A
Multilayer Perceptron (MLP)	Baseline	Large (Reference)	0%
MLP with Hyperparameter Tuning	Marginally higher (â‰¤ 0.99%)	Large	0%

Table 3: Capabilities of Evolutionary Algorithms in Training Neural Models [11]

Network Type	Task Example	Performance vs. Gradient-Based Methods	Notable Characteristics
Spiking Neural Networks (SNNs)	MNIST Classification	Comparable	Recapitulates brain-like dynamics; high energy efficiency
Analog Neural Networks	Atari Games	Comparable	Gradient-free training avoids instability issues
Recurrent Architectures	Cognitive Tasks	Comparable	Incorporates dopamine-driven plasticity and memory replay

Implementation and Workflow Visualization

The practical implementation of evolutionary multitasking involves a structured workflow that manages the interaction between multiple tasks and the shared population. The following diagram illustrates the core operational loop of a typical Evolutionary Multitask Optimization algorithm.

Figure 1: Evolutionary Multitasking Core Workflow

The bi-level optimization framework for neural architecture search represents a specific and powerful instance of evolutionary multitasking, where one level of evolution is nested within another.

Figure 2: Bi-Level Optimization for Neural Architecture Search

The Scientist's Toolkit: Research Reagent Solutions

This section catalogs the essential computational "reagents" and materials required to implement and experiment with evolutionary multitasking algorithms as drawn from the cited research.

Table 4: Essential Research Reagents for Evolutionary Multitasking

Tool / Component	Category	Function / Purpose	Exemplar Use Case
Evolutionary Multitask Optimization (EMTO) Framework	Algorithmic Paradigm	Provides the overarching structure for concurrent task solving via knowledge transfer.	Solving Multiple Task Optimization Problems (MTOPs) [10].
Multi-Factorial Evolutionary Algorithm (MFEA)	Base Algorithm	A foundational EMTO algorithm that enables implicit knowledge transfer via a unified search space.	Baseline for developing and testing new EMTO strategies [10].
Maximum Mean Discrepancy (MMD)	Similarity Metric	Statistically measures the similarity between the probability distributions of two task populations.	Used in MGAD for improved transfer source selection [10].
Grey Relational Analysis (GRA)	Similarity Metric	Measures the similarity of evolutionary trends between tasks based on the geometry of their solutions.	Used in MGAD in conjunction with MMD for source selection [10].
Anomaly Detection Strategy	Knowledge Filter	Identifies and filters out potentially deleterious or "negative" knowledge before transfer.	Core component of MGAD to reduce the risk of negative transfer [10].
Heterosynaptic Plasticity Model	Neuro-Inspired Mechanism	A local learning rule where the change in one synapse affects neighbors, stabilizing learning.	Integrated into EAs for training more robust, brain-like neural networks [11].
Bi-Level Optimization Framework	Search Architecture	Hierarchically separates architecture search (upper-level) from parameter training (lower-level).	Evolutionary Neural Architecture Search (EB-LNAST) [5].
FXIa-IN-14		FXIa-IN-14 is a potent FXIa inhibitor for thrombosis research. This product is For Research Use Only. Not for human or veterinary diagnostic or therapeutic use.	Bench Chemicals
Anti-inflammatory agent 74	Anti-inflammatory agent 74, MF:C41H51NO14, MW:781.8 g/mol	Chemical Reagent	Bench Chemicals

The training of sophisticated neural networks, particularly within high-stakes fields like drug development, is often hampered by complex, multi-modal loss landscapes and conflicting objectives. Traditional gradient-based optimizers are prone to becoming trapped in suboptimal local minima, while conventional evolutionary algorithms can suffer from slow convergence speeds. Evolutionary Multitasking (EMT) has emerged as a transformative paradigm that leverages synergies across multiple, related optimization tasks to overcome these hurdles. By enabling the simultaneous solving of several tasks within a single algorithmic run, EMT facilitates implicit knowledge transfer, which serves as a powerful mechanism for accelerating convergence and escaping poor local optima. This application note details the key advantages of EMT, provides validated experimental data, and outlines detailed protocols for its implementation in neural network training for scientific discovery.

Key Advantages and Quantitative Evidence

Evolutionary Multitasking provides two fundamental benefits for neural network training and optimization in complex scientific problems.

Convergence Acceleration: The transfer of genetic material (e.g., promising synaptic weights or architectural features) from one task to another provides a form of guided initialization and exploration. This knowledge sharing prevents the algorithm from starting from scratch for each new task, effectively "warming up" the search process and significantly reducing the number of function evaluations required to find a high-quality solution [12] [13]. For instance, an algorithm trained on a related protein folding prediction task can transfer insights to accelerate training on a new target protein.
Enhanced Solution Quality in Complex Landscapes: Multi-modal and non-convex loss landscapes, common in physics-informed neural networks (PINNs) and drug response prediction models, are challenging for gradient-based methods. The population-based nature of EMT, combined with cross-task knowledge transfer, promotes diverse exploration of the solution space. This helps the algorithm bypass deceptive local minima and discover more robust and generalizable solutions [14] [5]. This is critical for ensuring that neural network predictions are not only accurate but also physically consistent and reliable.

The table below summarizes empirical results from recent studies that demonstrate these advantages across various applications.

Table 1: Quantitative Performance of Evolutionary Multitasking and Related Algorithms

Algorithm / Study	Application Context	Key Metric Improvement	Reported Advantage
EMOPPO-TML [15]	Wireless Rechargeable Sensor Networks	Convergence Speed	LSTM-enhanced policy network achieved 25% faster convergence compared to conventional neural networks.
EMOPPO-TML [15]	Wireless Rechargeable Sensor Networks	Energy Usage Efficiency	LSTM integration improved long-term decision-making by 10% compared to standard PPO.
HRL-MOEA [13]	Multi-objective Recommendation Systems	Evolutionary Efficacy & Convergence	Hybrid RL strategy (SARSA & Q-learning) dynamically adapted genetic operators, enhancing convergence speed and solution quality.
EB-LNAST [5]	Color Classification & Medical Diagnostics (WDBC)	Model Compactness	Achieved up to 99.66% reduction in model size while maintaining competitive predictive performance (marginal reduction of â‰¤ 0.99%).

Experimental Protocols

This section provides a detailed methodology for replicating key experiments that validate the advantages of Evolutionary Multitasking.

Protocol: Evaluating Convergence Acceleration in a Multi-Task PINN Scenario

This protocol assesses the performance of EMT in optimizing Physics-Informed Neural Networks (PINNs) for a family of related partial differential equations (PDEs), a common scenario in drug delivery modeling.

1. Objective: To compare the convergence speed and solution accuracy of an EMT algorithm against a traditional single-task evolutionary optimizer when training PINNs for multiple PDEs with varying parameters.
2. Materials & Software:
- Benchmark Problems: A suite of two related PDEs, e.g., the Burgers' equation with different viscosity parameters [14].
- Algorithms:
  - Experimental Group: A Multi-factorial Evolutionary Algorithm (MFEA) or similar EMT framework.
  - Control Group: A standard Genetic Algorithm (GA) or Evolution Strategy (ES) run independently on each task.
- Software Framework: DeepXDE [14] or PyTorch/TensorFlow for PINN implementation, with a custom EMT library (e.g., PyGMO).
- Hardware: A computing cluster with multiple GPUs (e.g., NVIDIA V100 or A100) to handle parallel training of the population.
3. Experimental Procedure:
- Problem Formulation: Define the loss function for each PINN task, combining data fidelity terms and PDE residual terms as described in [14].
- Parameter Mapping: In the MFEA, encode the shared and task-specific components of the PINN's weights and biases into a unified representation.
- Algorithm Configuration:
  - MFEA: Set a random mating probability (e.g., rmp = 0.3) to control cross-task crossover.
  - GA & MFEA: Use identical population sizes (e.g., 100 individuals), crossover, and mutation rates for a fair comparison.
- Termination Criterion: Run all algorithms for a fixed budget of 200,000 function evaluations [12].
- Data Collection: Record the best and median loss value for each task at every 1,000 evaluations. Perform 30 independent runs with different random seeds [12].
4. Data Analysis:
- Plot the average convergence curves (loss vs. evaluations) for both algorithms across all tasks.
- Statistically compare the number of evaluations required by each algorithm to reach a pre-defined loss threshold using a Wilcoxon signed-rank test.
- The MFEA is expected to demonstrate steeper convergence and reach the threshold in fewer evaluations than the independent GAs.

The following diagram illustrates the core workflow and knowledge transfer mechanism of this EMT protocol.

Protocol: Assessing Solution Quality on a Drug Response Prediction Problem

This protocol evaluates the ability of EMT to find superior solutions for a complex multi-objective problem in drug development, such as balancing prediction accuracy with model fairness or robustness.

1. Objective: To compare the solution quality (Pareto front) of an EMT algorithm against a standard Multi-Objective Evolutionary Algorithm (MOEA) on a graph neural network (GNN) configured for drug response prediction.
2. Materials & Software:
- Dataset: A public drug response dataset (e.g., GDSC or TCGA), formatted as a graph structure where nodes represent genes/cells and edges represent interactions.
- Model: A Graph Neural Network (GNN) whose architecture and training hyperparameters are to be optimized [5].
- Algorithms:
  - Experimental Group: An EMT algorithm like MFEA adapted for multi-objective optimization (MO-MFEA) [12].
  - Control Group: A classical MOEA such as NSGA-II.
- Software: Deep Graph Library (DGL) or PyTorch Geometric, with an optimization framework like pymoo.
3. Experimental Procedure:
- Task Definition: Define two or more related tasks. For example:
  - Task 1: Optimize the GNN for a specific cancer type.
  - Task 2: Optimize the same GNN for a different, but genetically similar, cancer type.
- Objective Functions: For each task, the objectives are to maximize predictive accuracy (e.g., RÂ²) and minimize model complexity (number of parameters) to ensure deployability.
- Execution: Run both MO-MFEA and NSGA-II for a fixed number of generations (e.g., 500). Use identical population sizes and evaluation budgets.
- Evaluation: Upon termination, collect the final non-dominated solution set (Pareto front) from each algorithm and run.
4. Data Analysis:
- Calculate the Hypervolume (HV) metric for the obtained Pareto fronts to measure both convergence and diversity.
- Compare the average HV of MO-MFEA against NSGA-II across 30 independent runs. A statistically significant higher HV for MO-MFEA would indicate its superior ability to find a diverse set of high-quality solutions.
- The knowledge transfer in MO-MFEA is expected to help discover GNN architectures that are both accurate and efficient across multiple cancer types, outperforming the isolated optimization of NSGA-II.

The Scientist's Toolkit: Research Reagent Solutions

The following table catalogues essential algorithmic "reagents" for designing and implementing Evolutionary Multitasking experiments in neural network training.

Table 2: Key Research Reagents for Evolutionary Multitasking Experiments

Research Reagent	Function & Explanation	Representative Use-Cases
Multi-factorial Evolutionary Algorithm (MFEA)	The core algorithmic framework that evolves a single population of individuals, each encoded to solve multiple tasks simultaneously.	General-purpose multi-task optimization across diverse domains like PINNs [14] and neural architecture search [5].
Random Mating Probability (RMP)	A critical hyperparameter that controls the probability of crossover between individuals from different tasks. A low RMP limits transfer, a high one may cause negative interference.	Tuning knowledge transfer intensity in MFEA; essential for balancing exploration and exploitation [12] [13].
Hybrid RL-Adaptive Strategy (e.g., HRL-MOEA)	Uses reinforcement learning (e.g., SARSA & Q-learning) to dynamically adapt genetic operator probabilities during evolution, replacing fixed, hand-tuned parameters.	Enhancing convergence performance in complex multi-objective recommendation systems [13]; adaptable to drug discovery pipelines.
Bi-level Optimization Framework (e.g., EB-LNAST)	A hierarchical approach where an upper-level optimizer (e.g., for architecture) guides a lower-level optimizer (e.g., for weights).	Simultaneously discovering optimal neural network architectures and their training parameters for tasks like color classification [5].
Long Short-Term Memory (LSTM) Policy Network	An advanced neural network component within an evolutionary agent that helps capture temporal dependencies in decision-making.	Improving long-term performance and energy usage efficiency in sequential decision problems like path planning for mobile chargers [15].
Hsd17B13-IN-14	Hsd17B13-IN-14, MF:C21H16ClN3O3S, MW:425.9 g/mol	Chemical Reagent
Mat2A-IN-13	Mat2A-IN-13\|MAT2A Inhibitor\|Research Compound	Mat2A-IN-13 is a potent MAT2A inhibitor for cancer research. It targets methionine metabolism and SAM production. For Research Use Only. Not for human or veterinary use.

Evolutionary multitasking represents a paradigm shift in computational intelligence, leveraging the implicit parallelism of population-based search to solve multiple optimization tasks simultaneously [12]. Within the domain of neural network training, this approach facilitates efficient knowledge transfer between related tasks, accelerating convergence and improving generalization in complex models such as those used in drug discovery [16]. This framework is particularly valuable for high-dimensional problems including feature selection for biological data and optimization of network architectures, where it demonstrates superior performance compared to traditional isolated optimization methods [12] [16].

The conceptual foundation lies in mimicking evolutionary processes, where genetic material evolved for one task may prove beneficial for another, thereby creating a synergistic optimization environment [12]. When applied to neural network training, this enables the discovery of robust network parameters and architectures through implicit transfer of learned features and representations across related modeling tasks.

Theoretical Foundations

Evolutionary Multitasking Principles

Evolutionary multitasking operates on the principle that simultaneously solving multiple optimization tasks can induce cross-task genetic transfers that accelerate evolutionary progression toward superior solutions [12]. In biological terms, evolution itself functions as a massive multi-task engine where diverse organisms simultaneously evolve to survive in various ecological niches [12].

The mathematical formulation for multi-objective feature selectionâ€”a common neural network preprocessing taskâ€”illustrates this principle well [16]. The optimization problem is defined as:

Minimize F(x) = (fâ‚(x), fâ‚‚(x))
Subject to x âˆˆ Î© Where fâ‚(x) represents the number of selected features and fâ‚‚(x) denotes the classification error rate [16].

Neural Network Synergy

When integrated with neural networks, evolutionary multitasking provides a mechanism for parallel optimization of both network architecture and parameters across related domains. This synergy is particularly valuable for:

Architecture Search: Simultaneously evolving network topologies for multiple related tasks
Parameter Transfer: Enabling knowledge sharing between networks solving complementary problems
Regularization: Implicitly preventing overfitting through cross-task validation

Experimental Protocols

Benchmarking Standards

Rigorous evaluation of evolutionary multitasking algorithms requires standardized benchmarks and protocols. The CEC 2025 Competition on Evolutionary Multi-task Optimization establishes comprehensive guidelines for performance assessment [12].

Protocol Requirements:

Execute 30 independent runs with different random seeds
Record best function error values (BFEV) at predefined evaluation checkpoints
For 2-task problems: use 200,000 maximum function evaluations (maxFEs)
For 50-task problems: use 5,000,000 maxFEs [12]

Performance Metrics:

Calculate median BFEV across all runs for each computational budget checkpoint
Evaluate on standardized test suites containing nine complex MTO problems
Assess algorithm performance across varying computational budgets [12]

Dual-Perspective Feature Selection Methodology

The DREA-FS algorithm demonstrates the application of evolutionary multitasking to feature selection for neural network training [16]. This protocol specifically addresses high-dimensional data challenges common in drug development.

Experimental Workflow:

Task Construction: Create simplified and complementary tasks using filter-based and group-based dimensionality reduction
Dual-Archive Optimization:
- Diversity Archive: Preserves feature subsets with equivalent performance
- Elite Archive: Provides convergence guidance
Knowledge Transfer: Implement cross-task genetic transfers through specialized reproduction operators
Solution Refinement: Balance convergence and diversity across tasks to identify multimodal solutions [16]

Validation Framework:

Test on 21 real-world datasets with varying dimensionality
Compare classification performance against state-of-the-art multi-objective algorithms
Evaluate ability to identify distinct feature subsets with equivalent objective values [16]

Implementation Framework

Computational Infrastructure

The successful implementation of evolutionary multitasking for neural networks requires specialized computational frameworks that balance expressiveness with efficiency [17].

Table 1: Deep Learning Frameworks Supporting Evolutionary Multitasking Research

Framework	Primary Strength	Execution Model	Hardware Support	Research Suitability
PyTorch	Research flexibility, dynamic graphs	Dynamic computation	Multi-GPU, distributed	Excellent for prototyping novel architectures [18]
TensorFlow	Production deployment, scalability	Static graph optimization	TPU, GPU, mobile	Strong for large-scale experiments [19]
JAX	High-performance computing	JIT compilation, functional	TPU, GPU	Ideal for evolutionary algorithm research [18]
Keras	Rapid prototyping	High-level API abstraction	GPU via TensorFlow	Excellent for quick experimentation [19]

Research Reagent Solutions

Table 2: Essential Research Components for Evolutionary Multitasking Neural Networks

Component	Function	Implementation Examples
Multi-factorial Evolutionary Algorithm (MFEA)	Enables simultaneous optimization of multiple tasks	MFEA framework for knowledge transfer between tasks [12]
Dual-Archive Mechanism	Maintains convergence and diversity	DREA-FS diversity and elite archives for feature selection [16]
Dimensionality Reduction	Creates simplified auxiliary tasks	Filter-based and group-based reduction for high-dimensional data [16]
Benchmark Test Suites	Standardized performance evaluation	CEC 2025 MTSOO and MTMOO problem sets [12]
Performance Metrics	Quantifies algorithm effectiveness	Best Function Error Value (BFEV), Inverted Generational Distance (IGD) [12]

Visualization Framework

Evolutionary Multitasking Architecture

DREA-FS Experimental Workflow

Comparative Analysis

Performance Benchmarking

Table 3: Evolutionary Multitasking Algorithm Performance Comparison

Algorithm	Feature Selection Accuracy	Convergence Speed	Multimodal Solution Diversity	Computational Complexity
DREA-FS	Superior (21 datasets)	Accelerated through knowledge transfer	High (dual-archive mechanism)	Moderate (balanced approach) [16]
Traditional MOFS	Moderate	Slow convergence cited as limitation	Limited	Low to moderate [16]
Single-Objective EMT	Varies with weighting scheme	Fast but limited scope	Minimal (single solution)	Low [16]
MFEA Baseline	Competitive on select tasks	Standard evolutionary pace	Moderate	Moderate [12]

The integration of evolutionary multitasking with neural network training establishes a powerful framework for addressing complex optimization challenges in domains such as drug development. The DREA-FS algorithm exemplifies this approach, demonstrating significant improvements in feature selection performance while identifying multiple equivalent solutions that enhance interpretability [16]. Standardized benchmarking protocols, as outlined in the CEC 2025 competition, provide the necessary foundation for rigorous evaluation and continued advancement in this field [12].

Future research directions should focus on scaling these approaches to ultra-high-dimensional problems, enhancing cross-task knowledge transfer mechanisms, and developing more efficient diversity preservation techniques. The synergy between evolutionary computation and neural networks continues to offer promising avenues for addressing increasingly complex real-world optimization challenges.

Building and Applying EMT Frameworks to Drug Discovery Challenges

Multi-Factorial Evolutionary Algorithms (MFEAs) represent a paradigm shift in evolutionary computation, enabling the simultaneous solution of multiple optimization tasks within a single unified search process. The core innovation of MFEA lies in its ability to transfer knowledge across tasks implicitly through a unified genetic representation and crossover operations, thereby leveraging synergies and complementarities between tasks to accelerate convergence and improve solution quality [20] [21]. This multifactorial inheritance framework stands in contrast to traditional evolutionary approaches that handle optimization problems in isolation, making it particularly valuable for complex real-world domains where multiple related problems must be addressed concurrently [21].

In the context of drug discovery, MFEAs offer transformative potential by enabling researchers to optimize multiple molecular properties, predict various biological activities, and explore diverse chemical spaces simultaneously. The pharmaceutical industry faces enormous challenges in navigating high-dimensional optimization landscapes where efficacy, specificity, toxicity, and synthesizability must be balanced [22] [23]. MFEA provides a robust computational framework for addressing these multifactorial challenges through intelligent knowledge transfer between related drug discovery tasks, potentially reducing development timelines and costs while improving success rates [24] [25].

Foundational Concepts and Mechanisms

Core MFEA Architecture

The MFEA architecture operates on the principle of implicit genetic transfer through a unified search space. Unlike traditional evolutionary algorithms that maintain separate populations for separate tasks, MFEA maintains a single population where each individual possesses a skill factor indicating its task affinity alongside a multifactorial fitness that represents its performance across all tasks [21]. This design enables the automatic discovery and exploitation of genetic material that proves beneficial across multiple tasks through crossover operations between individuals with different skill factors [20].

The algorithm incorporates two fundamental components: (1) a multifactorial fitness evaluation that assesses solutions across all tasks, and (2) assortative mating that preferentially crosses individuals with similar skill factors while allowing controlled cross-task recombination [21]. This balanced approach maintains task specialization while permitting beneficial knowledge transfer. The recent introduction of multipopulation MFEA variants further enhances this framework by employing multiple subpopulations with adaptive migration strategies, allowing more controlled knowledge exchange and better management of negative transfer between dissimilar tasks [20].

Knowledge Transfer Mechanisms

Effective knowledge transfer constitutes the core advantage of MFEA over single-task evolutionary approaches. The transfer occurs implicitly through crossover operations between individuals from different tasks, allowing beneficial genetic material to propagate across the search spaces of related optimization problems [21]. This mechanism enables the algorithm to discover underlying commonalities between tasks and utilize them to escape local optima and accelerate convergence.

Advanced MFEA implementations incorporate adaptive knowledge transfer mechanisms that dynamically regulate the intensity and direction of genetic exchange based on measured transfer effectiveness [20]. These approaches monitor the performance improvement attributable to cross-task crossover and adjust migration rates between subpopulations accordingly, thereby maximizing positive transfer while minimizing potential negative interference between conflicting tasks. This adaptability proves particularly valuable in drug discovery applications where the relationships between different molecular optimization tasks may not be known a priori [24] [25].

MFEA Design Protocols for Drug Discovery

Representation Strategies for Molecular Optimization

The design of effective representation schemes constitutes a critical foundation for successful MFEA implementation in drug discovery. The Network Random Key (NetKey) representation provides a flexible approach that accommodates both complete and sparse graph-based molecular representations, making it suitable for diverse drug discovery tasks ranging from molecular graph optimization to chemical reaction planning [20]. This representation encodes solutions as vectors of random numbers that are subsequently decoded into actual structures through a deterministic mapping process, allowing standard evolutionary operators to be applied while maintaining structural feasibility.

For molecular property optimization, multitask graph representations enable simultaneous optimization of multiple pharmacological properties by sharing substructural patterns across related tasks [24]. This approach leverages the observation that certain molecular scaffolds or functional groups confer desirable properties across multiple optimization objectives, allowing knowledge about promising chemical motifs to transfer implicitly between tasks through the evolutionary process.

Experimental Protocol: Multi-Task Molecular Optimization

Objective: Simultaneously optimize multiple drug properties including target binding affinity, solubility, and metabolic stability.

Materials and Reagents:

Chemical Libraries: Curated compound collections (e.g., ZINC, ChEMBL)
Descriptor Software: RDKit or OpenBabel for molecular feature generation
Validation Assays: In silico prediction models or high-throughput screening data

Procedure:

Task Definition: Define 3-5 related drug optimization tasks with shared molecular representation.
Population Initialization: Initialize population of 500-1000 individuals with diverse skill factors.
Multifactorial Evaluation:
- Decode each individual to molecular representation
- Evaluate on assigned task using relevant objective functions
- Compute multifactorial rank considering performance across all tasks
Assortative Mating:
- Select parents with 70% probability for same-task mating
- Allow 30% cross-task mating with adaptive transfer control
Evolutionary Operators:
- Apply simulated binary crossover with distribution index of 15
- Implement polynomial mutation with probability 1/n (n: number of variables)
Skill Factor Assignment: Assign offspring to task demonstrating highest fitness improvement.
Termination Check: Continue for 100-200 generations or until convergence criteria met.

Validation: Confirm optimized molecules through molecular dynamics simulations and in vitro assays.

Advanced MFEA Configurations

Multipopulation Adaptive MFEA

The multipopulation MFEA variant addresses limitations of single-population approaches by maintaining distinct subpopulations for different tasks while enabling controlled knowledge exchange through periodic migration [20]. This architecture proves particularly beneficial for drug discovery applications where tasks may have partially conflicting objectives or different computational expense characteristics.

Implementation Protocol:

Subpopulation Initialization: Initialize separate subpopulations of 200-500 individuals per task.
Migration Policy: Implement adaptive migration where number of migrating individuals adjusts based on measured transfer effectiveness.
Interval Determination: Conduct migration every 10-15 generations to allow sufficient local convergence.
Elite Preservation: Protect top 10% performers in each subpopulation from replacement by migrants.
Negative Transfer Monitoring: Track performance degradation attributable to migration and adjust policy accordingly.

Hybrid MFEA with Surrogate Modeling

The integration of surrogate models with MFEA creates a powerful framework for drug discovery applications involving computationally expensive fitness evaluations, such as molecular dynamics simulations or quantum chemistry calculations [26]. This approach substitutes expensive function evaluations with efficient data-driven models during initial search phases, reserving precise evaluations for promising regions.

Quantitative Performance Analysis

Comparative Performance Metrics

Table 1: Performance Comparison of MFEA Variants on Drug Discovery Benchmarks

Algorithm Variant	Average AUC	Success Rate	Computational Speedup	Negative Transfer Incidence
Single-Task EA	0.709	64.2%	1.0x	N/A
Standard MFEA	0.690	61.6%	1.8x	37.7%
Group-Selected MFEA	0.719	68.9%	2.1x	21.3%
Adaptive MP-MFEA	0.734	72.5%	2.4x	12.8%

Table 2: MFEA Application Across Drug Discovery Tasks

Application Domain	Tasks Combined	Performance Gain	Key Transfer Mechanism
Drug-Target Interaction Prediction	268 targets grouped by ligand similarity	15.3% average AUC improvement	Shared molecular representation across similar targets
Multi-Property Optimization	Solubility, permeability, metabolic stability	2.9x convergence acceleration	Substructure pattern transfer
Chemical Reaction Optimization	Yield, selectivity, safety	47% reduction in experimental iterations	Reaction condition knowledge sharing

Case Study: Drug-Target Interaction Prediction

Experimental Framework

Background: Predicting drug-target interactions constitutes a fundamental challenge in drug discovery, particularly with limited labeled data for novel targets. Multi-task learning approaches have demonstrated potential but often suffer from negative interference between dissimilar targets [25].

MFEA Implementation:

Task Grouping: 268 targets clustered into 103 groups based on ligand similarity using Similarity Ensemble Approach (SEA)
Representation: Extended-connectivity fingerprints (ECFP4) combined with protein sequence descriptors
Population Structure: Multipopulation MFEA with 300 individuals per cluster
Knowledge Transfer: Adaptive migration policy based on measured AUC improvements

Results Analysis: The group-selected MFEA approach achieved significantly higher average AUC (0.719) compared to single-task learning (0.709) and standard MFEA (0.690). The method demonstrated particularly strong performance improvement for targets with limited training data, where knowledge transfer from data-rich similar targets provided maximum benefit [25]. Negative transfer was effectively minimized through the similarity-based grouping strategy, with only 21.3% of tasks experiencing performance degradation compared to 37.7% in ungrouped MFEA.

Protocol: Similarity-Based Task Grouping

Objective: Group drug discovery tasks to maximize positive knowledge transfer while minimizing negative interference.

Procedure:

Similarity Computation: Calculate target similarity using Tanimoto coefficient on ligand sets or structural homology metrics.
Hierarchical Clustering: Apply average-linkage hierarchical clustering to build task similarity dendrogram.
Cluster Determination: Cut dendrogram at threshold maximizing cross-task performance correlation.
Validation: Verify cluster coherence through internal validation metrics and biological relevance.
MFEA Configuration: Implement separate subpopulations for each coherent task cluster.

The Researcher's Toolkit: Essential MFEA Components

Table 3: Research Reagent Solutions for MFEA Implementation

Component	Function	Implementation Examples
Multitask Representation	Encodes solutions for multiple tasks	NetKey encoding [20], Graph neural networks [24]
Skill Factor Assignment	Identifies task affinity for each individual	Random assignment, Fitness-based bias [21]
Adaptive Migration Controller	Regulates knowledge transfer between tasks	Performance-based migration rate adjustment [20]
Surrogate Models	Accelerates expensive fitness evaluations	Multilayer perceptrons, Radial basis functions [26]
Task Similarity Metrics	Quantifies relatedness between tasks	Ligand-based similarity [25], Performance profiling
Negative Transfer Detection	Identifies and mitigates harmful knowledge transfer	Performance degradation monitoring [20]
Magnesium isoglycyrrhizinate hydrate	Tianqingganmei	Tianqingganmei is a hepatoprotective agent for research into chronic hepatitis and liver disorders. This product is for Research Use Only (RUO).
SARS-CoV-2-IN-78	SARS-CoV-2-IN-78, MF:C13H17N5O5S, MW:355.37 g/mol	Chemical Reagent

Multi-Factorial Evolutionary Algorithms represent a powerful paradigm for addressing the complex, multi-objective challenges inherent in modern drug discovery. By enabling implicit knowledge transfer between related tasks, MFEAs accelerate convergence, improve solution quality, and facilitate the discovery of compounds that simultaneously optimize multiple pharmacological properties. The architectural blueprints presented in this work provide researchers with practical protocols for implementing MFEA approaches across diverse drug discovery applications, from target identification to lead optimization.

Future research directions include the integration of MFEA with large-language models for molecular design, the development of federated MFEA approaches for distributed drug discovery collaborations, and the application of multi-factorial optimization to emerging modalities such as PROTACs and molecular glues [24] [23]. As artificial intelligence continues to transform pharmaceutical research, MFEAs offer a robust framework for navigating the complex trade-offs and multi-objective decisions that define successful drug development campaigns.

Evolutionary computation and neural network training represent two foundational pillars of modern artificial intelligence research. Their convergence has created powerful hybrid algorithms capable of solving complex optimization problems, particularly in data-scarce domains like drug discovery. A significant innovation within this domain is the development of dual-population strategies featuring independent evolution with bidirectional knowledge transfer. These frameworks maintain multiple, distinct populations that evolve independently to explore different regions of the search space or exploit different aspects of a problem. Through carefully designed bidirectional transfer mechanisms, these populations share acquired knowledge, leading to accelerated convergence, enhanced solution diversity, and superior overall performance compared to single-population approaches.

The core principle involves orchestrating a synergistic relationship where populations with complementary search characteristicsâ€”such as one prioritizing objective optimization and another focusing on constraint satisfactionâ€”mutually enhance each other's evolutionary trajectory [27] [28]. This paradigm is especially potent in evolutionary multitasking, where solutions to multiple, potentially related, optimization problems are sought simultaneously. By formulating complex tasks like drug property prediction and molecular optimization as multitasking problems, these strategies leverage cross-task insights to discover solutions that might remain elusive with traditional, isolated optimization methods [29] [24].

Core Principles and Mechanisms

Architectural Framework

Dual-population strategies are defined by their maintenance of two co-evolving populations, each with a distinct evolutionary role. The architecture is not merely redundant but is designed for functional specialization.

Driving Population (P_drive): This population is typically tasked with aggressive objective optimization, often with relaxed constraints. Its purpose is to pioneer high-performance regions of the search space, providing strong selection pressure toward the unconstrained Pareto front [27].
Conventional/Normal Population (P_normal): This population operates with a more conservative strategy, strictly adhering to feasibility constraints. It ensures that the search process maintains a repository of valid, feasible solutions, balancing objectives with constraint satisfaction [27].

The power of this architecture emerges from the bidirectional knowledge transfer connecting these populations. This is not a simple periodic exchange of solutions, but a sophisticated, often adaptive, sharing of genetic or learned information.

Knowledge Transfer Modalities

The transfer of knowledge between populations can be implemented through several mechanisms, each with distinct advantages:

Individual Migration: Selected individuals (elites or promising offspring) from one population are periodically injected into the other. This direct transfer introduces building blocks of high-quality solutions directly into the partner population's gene pool [29].
Model-Based Transfer: Instead of transferring raw solutions, the internal models or search biases of one population are used to influence the reproduction or selection processes of the other. For instance, a probabilistic model of a high-performing region discovered by P_drive can guide the generation of offspring in P_normal [24].
Fitness-Based Knowledge Sharing: The most common approach involves using genetic material from one population to create offspring in the other via crossover-like operations. A hybrid update strategy combining local and global search can be employed to effectively integrate this external knowledge, improving the quality and diversity of both populations [29].

Applications in Drug Discovery and Bioinformatics

The pharmaceutical industry, with its inherently high failure rates and costly development pipelines, stands to benefit immensely from advanced optimization techniques like dual-population strategies [30]. These methods are being integrated into end-to-end platforms such as Baishenglai (BSL), which unify multiple drug discovery tasks within a single, multi-task learning framework [24].

Table 1: Applications of Dual-Population Strategies in Drug Discovery

Application Area	Specific Task	Impact of Dual-Population Strategy
Target Identification	Positive-Unlabeled (PU) Learning for Target-Disease Association [30] [29]	An auxiliary population (`P_a`) identifies more reliable positive samples, while the main population (`P_o`) performs standard classification, overcoming label scarcity [29].
Molecular Optimization	Constrained Multi-Objective Optimization (CMOP) for Compound Design [27]	Balances multiple conflicting objectives (e.g., potency, solubility) with complex constraints (e.g., synthetic accessibility, toxicity), avoiding local optima [27].
Property Prediction	Drug-Target Affinity (DTI) & Drug-Drug Interaction (DDI) Prediction [24]	Enhances generalization on Out-of-Distribution (OOD) data by maintaining a diverse set of solution hypotheses, crucial for novel molecular structures [24].
Clinical Trial Analysis	Identification of Prognostic Biomarkers [30]	Improves the robustness of biomarker signatures by exploring a wider solution space, mitigating overfitting to limited clinical data [30].

Beyond direct drug discovery, the protein structure prediction field has seen related advances. For example, combined models using Bidirectional Recurrent Neural Networks (BiRNN) demonstrate how processing sequence information in both forward and backward directionsâ€”a conceptual cousin to bidirectional knowledge transferâ€”yields a more comprehensive context for accurate secondary structure prediction [31].

Quantitative Performance Analysis

Empirical validation across numerous benchmark problems and real-world applications consistently demonstrates the superiority of dual-population strategies over single-population and non-collaborative algorithms.

Table 2: Performance Comparison of Selected Dual-Population Algorithms

Algorithm	Benchmark / Domain	Key Performance Metric	Result vs. Baseline Algorithms
EMT-PU (Evolutionary Multitasking for PU Learning) [29]	12 PU Learning Datasets	Classification Accuracy	Consistently outperformed several state-of-the-art PU learning methods [29].
CMOEA-DDC (Constrained Multi-Objective EA) [27]	Various CMOEA Test Problems & Real-World Scenarios	Overall Performance	Significantly outperformed seven representative CMOEAs [27].
DCP-RLa (Dual-Population Collaborative Prediction) [28]	CEC2018 Dynamic Problems	Inverted Generational Distance (IGD)	Showed effectiveness and superiority in tracking dynamic Pareto fronts [28].
BSL Platform (Integrates multiple ML models) [24]	Various Drug Discovery Tasks (DTI, DDI, etc.)	Success Rate in Real-World Assays	Identified three novel bioactive compounds for GluN1/GluN3A NMDA receptor in vitro [24].

The performance gains are primarily attributed to two factors: (1) the complementary search focus of the two populations, which ensures a balanced approach to convergence and diversity, and (2) the bidirectional knowledge transfer, which prevents either population from stagnating and allows them to leverage each other's discoveries [27] [28]. In dynamic environments, the reinforcement learning-adjusted collaboration in algorithms like DCP-RLa further optimizes this balance based on real-time performance feedback [28].

Experimental Protocols

Protocol 1: Implementing EMT-PU for Positive-Unlabeled Learning

This protocol outlines the steps to apply the EMT-PU algorithm to a drug discovery task such as drug interaction prediction or fake review detection [29].

1. Problem Formulation and Dataset Preparation:

Task Definition: Define the original task (T_o) as a standard PU classification task to distinguish both positive and negative samples from an unlabeled set.
Auxiliary Task Creation: Construct the auxiliary task (T_a) focused specifically on discovering more reliable positive samples from the unlabeled set.
Data Processing: Format your data where each sample is a feature vector. The initial labeled set should contain only positive samples, with the remainder being unlabeled.

2. Algorithm Initialization:

Population Setup: Initialize two populations:
- P_o: To solve the original task T_o.
- P_a: To solve the auxiliary task T_a. A competition-based initialization strategy is recommended to accelerate its convergence [29].
Parameter Setting: Define evolutionary parameters (population size, crossover and mutation rates) and the knowledge transfer frequency.

3. Evolutionary Cycle with Bidirectional Transfer:

Independent Evolution: Evolve P_o and P_a independently for one generation using chosen evolutionary operators (selection, crossover, mutation).
Knowledge Transfer:
- Transfer from P_a to P_o: Implement a hybrid update strategy. Use high-quality individuals from P_a to influence the evolution of P_o, improving the quality of its individuals [29].
- Transfer from P_o to P_a: Implement a local update strategy. Use individuals from P_o to promote the diversity of P_a [29].
Evaluation: Evaluate all individuals in both populations against their respective task objectives (T_o or T_a).

4. Termination and Model Selection:

Stopping Condition: Repeat the evolutionary cycle until a termination criterion is met (e.g., a maximum number of generations or performance convergence).
Final Model: Select the best-performing classifier from the final P_o population for deployment.

Protocol 2: Dual-Population Collaborative Prediction for Dynamic Optimization

This protocol is adapted from the DCP-RLa algorithm for solving Dynamic Multi-objective Optimization Problems (DMOPs), relevant to adaptive drug scheduling or real-time treatment personalization [28].

1. Dynamic Detection and History Archiving:

Change Detection: Implement a mechanism to detect environmental changes in the optimization problem (e.g., shifting patient response models).
History Storage: Archive the final populations (Pt-1, Pt-2, etc.) from previous, static environments.

2. Dual-Population Prediction: Upon detecting a change, simultaneously generate two subpopulations for the new environment:

Cluster-based Multiple Prediction (CMP):
- Cluster the historical population from the last environment (Pt-1) in the decision space.
- Apply a second-order difference prediction model to each cluster's center and its history to forecast its new location.
- Generate the CMP subpopulation around these predicted centers to ensure convergence [28].
Manifold Prediction based on Knee Points (MPKP):
- Identify knee points and non-dominated solutions from historical populations.
- Predict the new location of knee points using an autoregressive (AR) model.
- Use the Closest-to-Ideal (CTI) point and random reinitialization to generate a diverse MPKP subpopulation that estimates the manifold of the new Pareto Front, enhancing diversity [28].

3. Reinforcement Learning-Based Fusion:

Strategy Evaluation: Assess the performance (e.g., convergence and diversity metrics) of the CMP and MPKP subpopulations in the new environment.
Q-Learning Adjustment: Use a Q-learning algorithm to adaptively decide the proportion of individuals from each subpopulation to form the final, combined initial population for the new environment. This balances diversity and convergence based on real-time performance [28].

4. Optimization Cycle:

The fused population is then used to initialize a standard Multi-Objective Evolutionary Algorithm (MOEA) for further optimization in the new environment until the next change is detected.

Workflow and System Diagrams

The following diagram illustrates the core logical structure and workflow of a generalized dual-population strategy with bidirectional knowledge transfer, integrating concepts from the cited protocols.

Diagram 1: Generalized workflow of a dual-population evolutionary algorithm with bidirectional knowledge transfer.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools and Frameworks

Tool/Reagent	Type/Purpose	Function in Research	Example/Reference
TensorFlow / PyTorch	Programmatic Framework	Provides the foundational open-source libraries for building and training deep learning models, including those used in evolutionary multitasking [30].	[30]
Scikit-learn	ML Library	Offers basic evaluation metrics (e.g., F1 score, AUC) and standard ML algorithms for benchmarking and component use within larger evolutionary frameworks [30].	[30]
Baishenglai (BSL) Platform	Integrated Drug Discovery Platform	An open-access platform that integrates seven core tasks (e.g., DTI, DDI) using advanced deep learning, facilitating the application of these methods without building pipelines from scratch [24].	[24]
Positive-Unlabeled (PU) Benchmarks	Standardized Datasets	Publicly available datasets (e.g., from UCI Repository) used to train and validate PU learning algorithms like EMT-PU, enabling reproducible research [29].	[29]
CEC Benchmark Suites	Optimization Problem Sets	Standardized test problems (e.g., CEC2018 for dynamic problems) for fairly comparing the performance of different constrained and dynamic multi-objective optimization algorithms [28].	[28]
SARS-CoV-2-IN-82	SARS-CoV-2-IN-82, MF:C18H18N2, MW:262.3 g/mol	Chemical Reagent	Bench Chemicals
Aurein 2.6	Aurein 2.6 Antimicrobial Peptide		Bench Chemicals

Epithelial-mesenchymal transition (EMT) is a critical biological process in cancer progression, during which epithelial cells lose their polarity and cell-cell adhesion and gain migratory and invasive properties to become mesenchymal stem cells. This transition, driven by genetic and epigenetic alterations, facilitates cancer metastasis and is associated with therapy resistance [32]. In breast cancer, type-3 EMT (oncogenic EMT in carcinoma cells) arises from tumor microenvironmental cuesâ€”including hypoxia, growth factors, and inflammatory cytokinesâ€”that collectively drive invasion and metastasis [32].

The identification of EMT-related biomarkers presents a fundamental machine learning challenge: traditional supervised learning requires completely annotated datasets, but in practice, many positive biomarker instances remain unlabeled in large-scale omics studies. This scenario creates an ideal application for positive-unlabeled (PU) learning, where only some positive samples are labeled alongside many unlabeled samples of unknown status [33]. Evolutionary multitasking (EM) provides a powerful framework to address this challenge by simultaneously solving multiple related learning tasks, leveraging their synergies to improve overall performance in biomarker discovery.

EMT Signaling Pathways and Molecular Drivers

Core EMT Biomarkers and Functional Classification

Table 1: Key Molecular Markers in Epithelial-Mesenchymal Transition

Category	Biomarker	Functional Role in EMT	Detection Method
Epithelial Markers (Loss)	E-cadherin (CDH1)	Cell-cell adhesion molecule; downregulation enables dissociation	IHC, Western Blot [32]
	Cytokeratins	Structural integrity of epithelial cells; loss increases plasticity	Immunofluorescence [32]
Mesenchymal Markers (Gain)	N-cadherin	Promotes cell motility and invasion; cadherin switching	RNA-seq, IHC [32]
	Vimentin	Intermediate filament providing mechanical support	IHC, Proteomics [32]
	Fibronectin	Extracellular matrix component facilitating migration	Mass spectrometry [32]
Transcription Factors	SNAI1/Snail	Represses E-cadherin transcription	ChIP-seq, RNA-seq [32]
	TWIST1	Regulates actin cytoskeleton reorganization	scRNA-seq [32]
	ZEB1/2	Transcriptional repressors of epithelial genes	ATAC-seq, RNA-seq [32]
Matrix Metalloproteinases	MMP-2, MMP-9	Degrade type IV collagen in basement membrane	Zymography, Proteomics [32]
	MMP-3, MMP-7	Cleave E-cadherin; disrupt cell-cell adhesion	LC-MS/MS [32]

EMT Signaling Pathway Architecture

Positive-Unlabeled Learning Framework for EMT Biomarker Discovery

Problem Formulation and Mathematical Foundation

In traditional binary classification for biomarker discovery, the training set consists of labeled positive (P) and negative (N) samples: ( D = {(xi,yi)}{i=1}^n ) where ( yi \in {0,1} ). However, in PU learning for EMT biomarker identification, only some positive samples are labeled, while the remaining positives and all negatives form the unlabeled set (U): ( D = P \cup U ), where ( U ) contains both positive and negative samples [33].

The key insight of PU learning is that the unlabeled set can be treated as negative samples with class prior probability ( \pi = P(y=1) ) incorporated to adjust the loss function. For convolutional neural networks applied to histopathology images with incomplete annotations, the standard binary cross-entropy loss: [ L = -\frac{1}{n} \sum{i=1}^n [yi \log(p(xi)) + (1-yi) \log(1-p(xi))] ] is reformulated for PU learning as [33]: [ L{PU} = -\frac{1}{nP} \sum{x \in P} \log(p(x)) - \frac{1}{nU} \sum{x \in U} [\log(1-p(x)) - \pi \log(1-p(x))] ] where ( nP ) and ( nU ) are the numbers of positive and unlabeled samples, and ( \pi ) is the class prior probability.

Evolutionary Multitasking PU Learning Architecture

Experimental Protocol for EMT Biomarker Identification

Data Acquisition and Preprocessing

Multi-omics Data Integration:

Genomic Data: Download somatic mutation data from The Cancer Genome Atlas (TCGA) using the Genomic Data Commons Data Portal. Focus on mutations in EMT-related pathways (TGF-Î², Wnt, Notch).
Transcriptomic Data: Obtain RNA-seq data for breast cancer samples from TCGA-BRCA and METABRIC cohorts. Apply TPM normalization and batch effect correction using ComBat.
Proteomic Data: Acquire mass spectrometry-based proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC). Normalize using quantile normalization.
Epigenomic Data: Collect DNA methylation arrays (Illumina Infinium MethylationEPIC). Process with minfi package for Î²-value calculation.

Positive Label Definition:

Curate known EMT biomarkers from CIViCmine database and literature review [34]
Define positive set as proteins with established predictive biomarker evidence for targeted cancer therapeutics
Remaining proteins constitute the unlabeled set for PU learning

Evolutionary Multitasking Implementation

Table 2: Multi-Task Configuration for EMT Biomarker Discovery

Task ID	Objective	Data Modality	Positive Labels	Evaluation Metric
T1	Transcription Factor Biomarkers	RNA-seq + ATAC-seq	SNAI1, TWIST1, ZEB1	AUC-PR, F1-score
T2	Extracellular Matrix Biomarkers	Proteomics + Glycomics	MMP2, MMP9, VIM	Precision@10, ROC-AUC
T3	Cell Surface Receptor Biomarkers	Phosphoproteomics	EGFR, FGFR, TGFBR	Matthews Correlation Coefficient
T4	Metabolic Reprogramming Biomarkers	Metabolomics + RNA-seq	GLUT1, CAV1, PKM2	Balanced Accuracy

Algorithm 1: Evolutionary Multitasking PU Learning for EMT Biomarkers

Model Training and Validation

Feature Selection:

Apply minimum redundancy maximum relevance (mRMR) filtering to reduce dimensionality
Retain top 500 features per modality based on mutual information with positive labels
Conduct network-based feature expansion using protein-protein interaction networks

Model Configuration:

Implement XGBoost classifiers with PU learning adjustment [34] [35]
Set class prior Ï€ = 0.15 based on literature estimates of known EMT biomarkers
Use 5-fold nested cross-validation to prevent data leakage
Apply SMOTE-Tomek links for class imbalance correction in the latent positive set [35]

Performance Metrics:

Calculate Biomarker Probability Score (BPS) as normalized summative rank across models [34]
Compute area under precision-recall curve (AUC-PR) as primary metric for imbalanced data
Report F1-score, balanced accuracy, and Matthews correlation coefficient

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Materials for EMT Biomarker Studies

Reagent/Category	Specific Examples	Experimental Function	Application Context
Antibodies for IHC	Anti-E-cadherin, Anti-vimentin, Anti-N-cadherin	Protein localization and expression validation	Tissue microarray staining; confirmation of EMT state [32]
qPCR Assays	TaqMan assays for SNAI1, TWIST1, ZEB1, CDH1	mRNA expression quantification	Validation of transcriptomic biomarkers; cost-effective screening [32]
Cell Lines	MCF-10A, MCF-7, MDA-MB-231, HMLE	EMT model systems in vitro	Controlled experimentation; pathway manipulation studies [32]
Cytokine Cocktails	TGF-Î²1, EGF, TNF-Î±	EMT induction in epithelial cells	Positive control establishment; mechanistic studies [32]
Protease Inhibitors	GM6001 (MMP inhibitor), Marimastat	MMP activity blockade	Functional validation of MMP biomarkers; therapeutic testing [32]
siRNA/shRNA Libraries	SNAI1 siRNA, TWIST1 shRNA	Knockdown of EMT transcription factors	Functional validation of candidate biomarkers; pathway analysis [32]
Tnik-IN-7	Tnik-IN-7, MF:C23H22N4O2, MW:386.4 g/mol	Chemical Reagent	Bench Chemicals

Results Interpretation and Validation Framework

Performance Benchmarking

Table 4: Comparative Performance of EM-PU Learning vs. Baseline Methods

Method	AUC-PR	Precision@50	BPS Score	Novel Biomarkers
EM-PU Learning (Proposed)	0.82 Â± 0.04	0.76 Â± 0.05	0.88 Â± 0.03	42
Single-task PU Learning	0.71 Â± 0.06	0.64 Â± 0.07	0.75 Â± 0.05	28
Supervised Random Forest	0.62 Â± 0.08	0.53 Â± 0.09	0.65 Â± 0.07	15
Positive-Negative Learning	0.58 Â± 0.09	0.49 Â± 0.10	0.61 Â± 0.08	12

Experimental Validation Workflow

Statistical Analysis and Clinical Correlation

Perform survival analysis using Kaplan-Meier curves and log-rank test for prognostic validation
Conduct multivariate Cox proportional hazards regression adjusting for clinical covariates
Implement receiver operating characteristic (ROC) analysis for diagnostic performance
Calculate hazard ratios and 95% confidence intervals for clinical impact assessment

This protocol provides a comprehensive framework for applying evolutionary multitasking with positive-unlabeled learning to EMT biomarker discovery, enabling researchers to leverage incomplete annotations while capturing the complexity of epithelial-mesenchymal transition in cancer progression.

High-dimensional data, characterized by a vast number of features relative to sample size, presents significant challenges in machine learning and biomedical research. The process of feature selection (FS) is crucial for identifying the most discriminative features, improving model interpretability, and reducing computational costs [16] [36]. Traditional FS methods often struggle with the exponential growth of the search space and complex feature interactions inherent in high-dimensional datasets, such as those from genomics, medical imaging, and drug discovery [16] [37].

Evolutionary multitasking (EMT) has emerged as a powerful paradigm for enhancing evolutionary algorithms by leveraging knowledge transfer across multiple optimization tasks. This approach is particularly well-suited for feature selection, as it enables the construction of simplified, complementary tasks that facilitate more efficient exploration of the complex feature space [16] [38]. The DREA-FS algorithm represents an advanced implementation of this concept, specifically designed for multi-objective feature selection (MOFS) in high-dimensional classification scenarios [16].

This case study details the application notes and experimental protocols for DREA-FS, providing researchers with a comprehensive framework for implementing this methodology in biomedical data analysis, particularly in drug development contexts where both accuracy and interpretability are paramount.

The Multi-Objective Feature Selection Problem

Feature selection inherently involves optimizing multiple conflicting objectives. The standard multi-objective FS formulation aims to simultaneously minimize both the number of selected features and the classification error rate [16] [38]. For a dataset with D features, this can be formally expressed as:

min F(x) = (fâ‚(x), fâ‚‚(x)) Subject to: x âˆˆ {0,1}^D

Where:

fâ‚(x) represents the classification error rate
fâ‚‚(x) represents the number of selected features (cardinality of the subset)
x is a binary vector indicating whether each feature is selected (1) or not (0)

The exponential growth of the search space (2^D possible subsets) makes this problem NP-hard, necessitating sophisticated optimization approaches like evolutionary algorithms [16] [36].

DREA-FS Innovation Framework

DREA-FS addresses the limitations of conventional MOFS methods through two key innovations:

Dual-Perspective Dimensionality Reduction Strategy: Constructs simplified and complementary tasks using distinct dimensionality reduction methods to rapidly identify promising regions in the feature space [16].
Dual-Archive Multitask Optimization Mechanism: Maintains separate archives for preserving solution diversity and elite guidance, enhancing the ability to identify multiple feature subsets with equivalent performance (multimodal solutions) [16].

Table 1: Core Components of the DREA-FS Framework

Component	Type	Primary Function	Key Innovation
Filter-based Reduction	Task Formulation	Generate simplified task via statistical feature ranking	Rapid identification of promising feature regions
Group-based Reduction	Task Formulation	Create complementary task via feature clustering	Captures complex feature interactions
Elite Archive	Optimization Mechanism	Preserves solutions with best convergence properties	Guides population toward Pareto-optimal solutions
Diversity Archive	Optimization Mechanism	Maintains feature subsets with equivalent performance	Enables identification of multimodal solutions

Figure 1: DREA-FS workflow illustrating the dual-perspective reduction strategy and dual-archive optimization mechanism.

Research Reagent Solutions

Table 2: Essential Computational Tools and Frameworks for DREA-FS Implementation

Research Reagent	Category	Specific Implementation Examples	Application in DREA-FS
Evolutionary Algorithm Framework	Optimization Library	PlatEMT, Pymoo, DEAP	Provides base optimization algorithms and multitasking infrastructure
Dimensionality Reduction Methods	Feature Preprocessing	mRMR, ReliefF, SPEC	Implements filter-based and group-based task formulation
Classifier Models	Evaluation Metric	SVM, Random Forest, k-NN	Evaluates feature subset quality for fitness assignment
Performance Metrics	Validation Tools	Hypervolume, IGD, Classification Accuracy	Quantifies algorithm performance and solution quality
Statistical Testing	Validation Framework	Wilcoxon signed-rank test, t-test	Provides statistical significance for performance comparisons

Application Notes: Experimental Design and Validation

Dataset Selection and Preparation

For comprehensive validation, DREA-FS should be evaluated across diverse benchmark datasets with varying dimensionalities and problem characteristics:

Table 3: Recommended Dataset Characteristics for DREA-FS Validation

Dataset Type	Feature Dimension Range	Sample Size	Domain Examples	Key Evaluation Focus
Low-Dimensional	10 - 100 features	100 - 1000 samples	UCI Repository standards	Baseline performance comparison
Medium-Dimensional	100 - 1000 features	50 - 500 samples	Gene expression datasets	Search efficiency in larger spaces
High-Dimensional	1,000 - 10,000 features	20 - 200 samples	Neuroimaging, genomics	Scalability and convergence analysis
Ultra-High-Dimensional	10,000+ features	10 - 100 samples	Whole-genome sequencing	Robustness to extreme dimensionality

Proper data preprocessing is essential before applying DREA-FS:

Normalization: Apply z-score or min-max normalization to ensure features are on comparable scales
Missing Value Handling: Implement appropriate imputation strategies (e.g., k-nearest neighbors imputation)
Class Balance Assessment: Address severe class imbalance using techniques like SMOTE oversampling [39]
Data Partitioning: Employ stratified k-fold cross-validation (typically k=5 or k=10) to ensure representative training and testing splits

Performance Metrics and Evaluation Protocol

Comprehensive evaluation requires multiple performance metrics to assess different aspects of algorithm performance:

Table 4: Multi-Objective Feature Selection Performance Metrics

Metric Category	Specific Metrics	Evaluation Focus	Interpretation Guidance
Convergence	Hypervolume (HV), Inverted Generational Distance (IGD)	Proximity to true Pareto front	Higher HV and lower IGD indicate better convergence
Diversity	Spread, Spacing	Distribution and spread of solutions	Lower values indicate more uniform distribution
Classification Performance	Accuracy, Precision, Recall, F1-score, AUC	Quality of selected feature subsets	Standard interpretation for classification metrics
Complexity	Feature subset size, Computational time	Practical utility and efficiency	Smaller subsets and shorter times are preferred
Multimodality	Equivalent solution count, Feature diversity	Ability to identify alternative subsets	Higher counts indicate better multimodality discovery

Experimental Protocols

Protocol 1: DREA-FS Implementation and Parameter Configuration

Objective: Implement the core DREA-FS algorithm with optimal parameter settings for high-dimensional feature selection.

Materials:

Python 3.7+ with PlatEMT framework or equivalent evolutionary computation library
Scikit-learn for classifier implementation and performance evaluation
NumPy and SciPy for numerical computations
Benchmark datasets from Table 3

Procedure:

Algorithm Initialization
- Set population size N = 100 (adjust proportionally based on problem dimensionality)
- Initialize binary population with uniform random feature selection
- Set maximum function evaluations (MFE) = 10,000 as stopping criterion

Task Formulation Phase
- Filter-based Task Construction: Apply mutual information or Pearson correlation to rank features, select top K features (K = D/2 for initial configuration)
- Group-based Task Construction: Apply hierarchical clustering or k-means (k = D/10) to group correlated features, select representative features from each cluster
Evolutionary Optimization Configuration
- Apply binary tournament selection for parent selection
- Implement simulated binary crossover (SBX) with probability pc = 0.9
- Implement polynomial mutation with probability pm = 1/D
- Set distribution indices for crossover (Î·c = 20) and mutation (Î·m = 20)
Dual-Archive Management
- Elite Archive: Maintain non-dominated solutions with maximum capacity of 100 individuals
- Diversity Archive: Maintain equivalent-performance solutions with maximum capacity of 50 individuals
- Implement archive update after each generation using non-dominated sorting and crowding distance
Knowledge Transfer Mechanism
- Implement bidirectional transfer every 5 generations
- Apply individual-based transfer with probability r = 0.4
- Use feature mask exchange for filter-based tasks and weight transfer for group-based tasks

Figure 2: Detailed DREA-FS algorithmic workflow showing the main procedural components.

Protocol 2: Comparative Performance Analysis

Objective: Evaluate DREA-FS against state-of-the-art feature selection methods across multiple benchmark datasets.

Materials:

Implementation of comparative algorithms (single-objective FS, traditional MOFS, other EMT-based methods)
Benchmark datasets from Table 3
Statistical testing framework (e.g., scipy.stats)

Procedure:

Algorithm Selection
- Include single-objective FS methods (e.g., PSO, GA)
- Include traditional multi-objective FS methods (e.g., NSGA-II, MOEA/D)
- Include recent EMT-based FS methods (e.g., MO-FSEMT [38], PSO-EMT)
- Implement all algorithms with population size = 100 and MFE = 10,000 for fair comparison

Experimental Setup
- Conduct 30 independent runs for each algorithm-dataset combination
- Use 5-fold cross-validation for performance evaluation
- Employ SVM with linear kernel as base classifier for consistency
Performance Assessment
- Calculate all metrics from Table 4 for each algorithm
- Record computational time for efficiency comparison
- Generate Pareto front visualizations for qualitative assessment
Statistical Analysis
- Apply Wilcoxon signed-rank test with Î± = 0.05 for statistical significance
- Calculate p-values for pairwise comparisons between DREA-FS and each competitor
- Apply Bonferroni correction for multiple testing where appropriate
Multimodality Assessment
- Count distinct feature subsets with equivalent classification performance (<1% difference)
- Calculate Jaccard distance between equivalent subsets to quantify feature diversity
- Compare multimodality discovery capability across algorithms

Protocol 3: Biomedical Application Case Study

Objective: Apply DREA-FS to a real-world biomedical feature selection problem, specifically focusing on schizophrenia identification using functional brain networks [40].

Materials:

Resting-state fMRI data from schizophrenia patients and healthy controls
Preprocessed functional connectivity matrices
Clinical diagnostic labels for supervised learning

Procedure:

Data Preprocessing
- Preprocess rs-fMRI data using standard neuroimaging pipelines (e.g., FSL, SPM)
- Construct functional connectivity matrices representing correlations between brain regions
- Extract upper triangular elements of connectivity matrices as feature vectors
- Apply appropriate normalization (e.g., Fisher's z-transform) to correlation values

DREA-FS Configuration for Neuroimaging
- Adapt filter-based task to prioritize connections with high group difference (t-test)
- Configure group-based task to cluster functionally related brain regions
- Adjust population size based on feature dimensionality (typically 1,000-10,000 features)
- Set classification objective as schizophrenia vs. control discrimination
Validation Framework
- Implement leave-site-out cross-validation for multi-site data
- Compare with clinical standard feature selection methods
- Assess robustness through bootstrap sampling
Interpretability Analysis
- Identify consistently selected functional connections across cross-validation folds
- Map selected features to known brain networks (e.g., default mode, salience networks)
- Compare with literature on schizophrenia neuropathology
Counterfactual Explanation (Extension)
- Implement counterfactual analysis to determine minimal feature changes that alter predictions [40]
- Identify critical functional connections that differentiate patient and control classifications
- Generate hypotheses regarding potential intervention targets

The DREA-FS algorithm represents a significant advancement in multi-task multi-objective feature selection for high-dimensional data. Through its dual-perspective reduction strategy and dual-archive optimization mechanism, it effectively addresses key challenges in high-dimensional feature selection, including slow convergence, limited search capability, and the inability to identify multimodal solutions [16].

For researchers implementing this methodology, careful attention to parameter configuration is essential, particularly regarding the balance between exploration and exploitation. The population size should scale with problem dimensionality, while knowledge transfer probability should be tuned to maximize positive transfer while minimizing negative interference. Additionally, the complementary nature of the filter-based and group-based tasks is crucial for the algorithm's performanceâ€”the former provides rapid convergence guidance while the latter maintains diversity and discovers complex feature interactions.

In biomedical applications like drug development, DREA-FS offers particular value by identifying multiple equivalent feature subsets, providing flexibility when certain features are costly or difficult to measure in clinical practice. The algorithm's ability to maintain diverse solutions while achieving competitive classification performance makes it particularly suitable for biomarker discovery and clinical decision support systems where both accuracy and interpretability are critical requirements.

Solving Real-World Problems: Navigating Negative Transfer and Optimization Pitfalls

Negative transfer describes a phenomenon in machine learning where knowledge acquired from a source task interferes with, rather than improves, learning and performance on a related target task [41]. In the context of evolutionary multitasking and neural network training, this represents a significant challenge, as it can undermine the core objective of multi-task learning (MTL), which is to leverage commonalities and differences across tasks to enable more efficient learning and superior performance compared to single-task models [42] [1].

The fundamental cause of negative transfer is the discrepancy in the joint distributions between the source and target domains [41]. When a model learns non-transferable, task-specific features from the source domain, these features can act as noise or misleading signals for the target task, leading to performance degradation. This problem is particularly acute in fields like drug design, where data is often sparse and heterogeneous [43]. Mitigating negative transfer is therefore critical for the successful application of MTL and transfer learning in scientific domains.

The following tables summarize key quantitative data from experiments relevant to identifying and mitigating negative transfer, particularly in a drug discovery context.

Table 1: Summary of Protein Kinase Inhibitor (PKI) Dataset for Transfer Learning [43]

Protein Kinase (PK)	Total Unique PKIs	Active PKIs (Ki < 1000 nM)	Percentage Active	Total PK Annotations
PK 1	474	151	31.9%	> 55,141 (Total)
PK 2	1028	363	35.3%	...
...	...	...	...	...
PK 19	> 400	> 151	25 - 50%	...

Table 2: Performance Comparison of Mitigation Strategies on Benchmark Tasks

Mitigation Strategy	Base Model Performance (F1)	Performance with Mitigation (F1)	Relative Improvement	Key Mechanism
Exponential Moving Average Loss Weighting [42]	0.78	0.85	+8.97%	Loss balancing based on observed magnitudes
Meta-Learning Framework [43]	0.72	0.81	+12.50%	Optimal source sample selection & weight initialization
Two-Level Transfer Learning (TLTL) [1]	0.75	0.83	+10.67%	Inter-task and intra-task knowledge transfer

Experimental Protocols

Protocol 1: Meta-Learning for Sample Selection and Weight Initialization

This protocol outlines the methodology for mitigating negative transfer by identifying an optimal subset of source samples for pre-training [43].

Problem Formulation:
- Target Dataset: Define the data-scarce target task, ( T^{(t)} = {(xi^t, yi^t, s^t)} ), where ( x ) is the input (e.g., a molecule), ( y ) is the label (e.g., active/inactive), and ( s ) is a context vector (e.g., protein sequence).
- Source Dataset: Define the collective source data from related tasks, ( S^{(-t)} = {(xj^k, yj^k, s^k)}_{k \neq t} ).
Model Definition:
- Base Model (( f )): A model (e.g., a neural network) with parameters ( \theta ) for the primary prediction task (e.g., binary classification). It is trained on the weighted source data.
- Meta-Model (( g )): A model with parameters ( \varphi ) that predicts a weight for each source data point based on its features and context.
Meta-Training Loop:
- The meta-model ( g ) assigns a weight to each instance in ( S^{(-t)} ).
- The base model ( f ) is pre-trained on ( S^{(-t)} ) using a loss function weighted by the outputs of ( g ).
- The pre-trained base model ( f ) is then evaluated on a validation set from the target task ( T^{(t)} ), and the validation loss is computed.
- This validation loss is used to update the parameters ( \varphi ) of the meta-model ( g ), teaching it to assign higher weights to source samples that lead to better performance on the target task.
Final Training:
- After meta-training, the final base model is pre-trained on the optimally weighted source data and then fine-tuned on the full target dataset ( T^{(t)} ).

Protocol 2: Two-Level Transfer Learning Algorithm (TLTL)

This protocol is designed for evolutionary multitasking optimization to reduce negative transfer by structuring knowledge sharing [1].

Initialization:
- Initialize a population of individuals with a unified coding scheme.
- Set an inter-task transfer learning probability (( tp )).
Upper-Level: Inter-Task Transfer Learning:
- If a random value > ( tp ), perform inter-task knowledge transfer.
- Crossover: Implement chromosome crossover between individuals from different tasks.
- Elite Individual Learning: Exploit knowledge from the best-performing individuals (elites) across tasks to guide the search, reducing randomness compared to simple assortative mating.
Lower-Level: Intra-Task Transfer Learning:
- Transmit information from one dimension to other dimensions within the same optimization task.
- This accelerates convergence by leveraging inherent task structures and correlations.
Evaluation and Selection:
- Evaluate individuals based on multifactorial fitness (factorial cost, factorial rank, skill factor, scalar fitness) [1].
- Select elite individuals for each task to form the next generation.

Visualizations

Meta-Learning Framework for Negative Transfer Mitigation

Two-Level Transfer Learning Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials and Computational Tools for Negative Transfer Research

Item / Resource	Function / Description	Example Use Case
Curated Protein Kinase Inhibitor (PKI) Dataset [43]	A labeled dataset of chemical compounds and their bioactivities against specific protein targets; serves as the foundational data for source and target tasks.	Pre-training and fine-tuning models for drug activity prediction in low-data regimes.
Extended Connectivity Fingerprint (ECFP4) [43]	A circular fingerprint representation of molecular structure that encodes atoms and their neighborhoods; used as input features for machine learning models.	Converting SMILES strings of compounds into a fixed-length, numerical vector for model consumption.
Meta-Weight-Net Algorithm [43]	A meta-learning algorithm that learns to assign weights to individual training samples based on their loss.	Differentiating between useful and harmful source samples during pre-training.
Model-Agnostic Meta-Learning (MAML) Algorithm [43]	A meta-learning algorithm designed to find model weight initializations that allow for fast adaptation to new tasks with few gradient steps.	Preparing a base model for rapid fine-tuning on a novel, data-scarce target task.
Multifactorial Evolutionary Algorithm (MFEA) [1]	An evolutionary computation framework that solves multiple optimization tasks simultaneously by leveraging implicit transfer learning.	Conducting evolutionary multitasking optimization across related drug design problems.

Within the broader context of evolutionary multitasking neural network training research, a fundamental challenge is the effective selection and grouping of tasks to maximize knowledge transfer while minimizing interference. In computational chemistry and drug discovery, where data for individual molecular property prediction tasks is often scarce, this challenge becomes particularly acute. Multi-task learning (MTL) presents a powerful solution, operating on the principle that learning multiple related tasks simultaneously, using a shared representation, can improve generalization beyond what is achievable by learning each task in isolation [44] [45]. The core premise is that by leveraging the domain information contained in the training signals of related tasks, the model can develop a more robust and generalized internal representation [46]. The success of this paradigm, however, is critically dependent on the relatedness of the tasks being learned together. Grouping dissimilar tasks can lead to "negative transfer," where the performance on one or more tasks degrades due to interference from unrelated learning signals [47]. Therefore, the development of principled, data-driven methods for task selection and grouping is paramount for realizing the full potential of MTL in chemical domains. This document outlines application notes and protocols for leveraging chemical and biological similarity to construct effective multi-task learning groups, thereby enhancing the predictive performance of models for molecular property prediction.

Table 1: Key Research Reagent Solutions for MTL in Drug Discovery

Item Name	Function/Description
ChEMBL Database	A large-scale, open-access bioactivity database containing curated data on drug-like molecules and their effects on targets. Serves as a primary source for task-specific datasets [46] [47].
PubChem BioAssay	A public repository of biological screening results for small molecules. Used to gather datasets for groups of similar biological targets to build QSAR models [45].
SMILES/SELFIES Strings	Text-based representations of molecular structure. Serve as the fundamental input for many molecular featurization methods [48].
Molecular Graph Representation	A representation where atoms are nodes and bonds are edges. Enables the use of Graph Neural Networks (GNNs) to capture structural information [48] [49] [47].
Graph Neural Networks (GNNs)	A class of deep learning models that operate directly on graph structures. Used as the backbone architecture for learning from molecular graphs and extracting latent features [44] [48] [47].
Task Similarity Estimator (e.g., MoTSE)	A computational framework to quantitatively estimate the similarity between molecular property prediction tasks by analyzing pre-trained models, guiding effective task grouping and transfer learning [47].
FetterGrad Algorithm	An optimization algorithm designed for MTL that mitigates gradient conflicts between tasks by minimizing the Euclidean distance between task gradients, ensuring more stable and effective learning [48].

Quantitative Foundations: Performance of Multi-Task Learning Strategies

The efficacy of MTL strategies is empirically validated across diverse chemical prediction tasks. The tables below summarize key performance metrics from recent studies, highlighting the advantage of informed task grouping.

Table 2: Performance Comparison of MTL Strategies on QSAR Tasks

Strategy	Dataset	Key Metric	Performance	Context
Instance-based MTL	ChEMBL (1091 assays)	Number of Targets Where Strategy was Best	741 targets	Significantly outperformed single-task learning and feature-based MTL [46].
Feature-based MTL	ChEMBL (1091 assays)	Number of Targets Where Strategy was Best	179 targets	Outperformed single-task learning on a subset of targets [46].
Single-Task Learning	ChEMBL (1091 assays)	Number of Targets Where Strategy was Best	171 targets	Served as the baseline; performed best only when MTL was not beneficial [46].
MTL with Evolutionary Distance	ChEMBL	Predictive Accuracy	Significant Improvement	Incorporating evolutionary distance between protein targets as a similarity metric improved MTL QSAR performance [46].

Table 3: Performance of Advanced MTL Frameworks on Specific Drug Discovery Tasks

Model / Framework	Primary Task	Dataset(s)	Key Result	Comparison to Baseline
DeepDTAGen	Drug-Target Affinity (DTA) Prediction	KIBA, Davis, BindingDB	MSE: 0.146, CI: 0.897, rÂ²m: 0.765 (on KIBA)	Outperformed traditional ML and deep learning models (e.g., GraphDTA) [48].
MoTSE-Guided Transfer Learning	Molecular Property Prediction	QM9, PCBA	Superior Prediction Performance	Outperformed multitask learning, training from scratch, and 9 self-supervised learning methods [47].
Multi-task GNNs	Molecular Property Prediction	QM9, Fuel Ignition Properties	Higher Prediction Quality	Controlled experiments showed MTL outperforms single-task models, especially in low-data regimes [44].

Application Notes: Protocols for Task Selection and Grouping

Protocol 1: Task Grouping Based on Evolutionary Distance of Protein Targets

Principle: Biological targets that are evolutionarily related often share similar binding sites and structural motifs, leading to similarities in the chemical profiles of their active compounds. This phylogenetic relatedness provides a powerful, biologically grounded metric for task grouping [46].

Procedure:

Data Compilation: For a set of protein targets (e.g., kinases, GPCRs), gather bioactivity data (e.g., IC50, Ki) from public databases like ChEMBL [46] or PubChem [45].
Sequence Alignment: Perform a multiple sequence alignment of the protein sequences for the selected targets.
Distance Matrix Calculation: Compute a pairwise evolutionary distance matrix from the sequence alignment. Common metrics include the Jukes-Cantor or Kimura distances, which estimate the number of substitutions per site.
Task Similarity Definition: The evolutionary distance matrix is directly used as, or transformed into, a task similarity matrix. A smaller evolutionary distance implies higher task relatedness.
Model Training (Instance-based MTL): Implement an instance-based MTL model. In this approach, the training data from all related tasks (targets) are pooled, often with instance weighting, to construct a learner for each individual task. The underlying assumption is that data instances from one task can inform predictions for a related task [46].

Visual Workflow:

Protocol 2: Task Similarity Estimation via Model Embedding (MoTSE Framework)

Principle: The similarity between two molecular property prediction tasks can be inferred from the similarity of the "knowledge" encapsulated in their task-specific trained models. Two tasks are similar if their optimal models make decisions based on comparable molecular features [47].

Procedure:

Single-Task Pre-training: For each candidate task (T_i), pre-train a Graph Neural Network (GNN) in a supervised manner on its respective dataset. This creates a set of task-specific expert models.
Knowledge Extraction with a Probe Dataset: Using a common, unlabeled probe dataset of molecules:
- Attribution Method: For each pre-trained model, compute atom-level importance scores (e.g., using Saliency Maps or Integrated Gradients) for every molecule in the probe set. This captures local, atomic-level knowledge.
- Molecular Representation Similarity Analysis (MRSA): Extract the final molecular representation (embedding) from each pre-trained model for all molecules in the probe set. This captures global, molecular-level knowledge.
Task Embedding Projection: Aggregate the attribution scores and molecular representations across the probe dataset to form a fixed-dimensional vector that represents the "knowledge" of each task. Project all tasks into a unified latent task space based on these vectors.
Similarity Calculation: Calculate the pairwise similarity between tasks as the distance (e.g., cosine similarity, Euclidean distance) between their corresponding vectors in the latent task space.
Task Grouping or Transfer Learning: Use the derived similarity matrix to:
- Group Tasks: Cluster tasks (e.g., using k-means) to form cohesive groups for MTL.
- Guide Transfer Learning: For a target task with limited data, select the most similar source task (according to MoTSE) and fine-tune the source model on the target data [47].

Visual Workflow:

Protocol 3: Symmetry-Aware Multitask Learning for Chemical Reactions

Principle: In chemical reaction tasks such as atom mapping, incorporating an auxiliary, self-supervised task can force the model to learn more robust and generalizable representations of molecular graphs, which in turn improves performance on the primary task [49].

Procedure:

Data Representation: Represent each chemical reaction as a pair of molecular graphs (reactants and products). Handle imbalanced reactions (where reactant and product atom counts differ) using graph padding strategies [49].
Model Architecture: Design a multi-branch neural network, typically with a shared encoder (e.g., a GNN) that processes the molecular graphs, followed by two task-specific heads:
- Primary Task Head: Predicts the atom mapping between reactants and products (framed as a graph matching problem).
- Auxiliary Task Head: Performs a self-supervised task, such as predicting molecular symmetry or a related graph property, which does not require additional labels.
Joint Training: Train the model by simultaneously minimizing a weighted sum of the losses from the primary and auxiliary tasks. This encourages the shared encoder to develop features that are informative for both objectives.
Post-Prediction Refinement: After the model generates initial atom mappings, apply a post-processing step using an algorithm like the Weisfeiler-Lehman test to identify and account for topologically equivalent (symmetric) atoms, thereby refining the final mapping accuracy [49].

Visual Workflow:

Integration with Evolutionary Multitasking Research

The protocols described herein align with and advance the core objectives of evolutionary multitasking research. The principle of "inter-task genetic transfers" in Evolutionary Algorithms (EAs), where genetic material evolved for one task proves useful for another, directly mirrors the knowledge-sharing objective of MTL [12]. The methodologies outlined provide a structured, data-driven approach to explicitly define and quantify the "latent synergy" between tasks, which is often assumed but not explicitly modeled in many evolutionary multitasking paradigms [12].

Furthermore, the MoTSE framework can be viewed as a systematic approach to building a "task-relatedness" map, which could guide the formulation of multi-task optimization problems in evolutionary computation. By identifying clusters of highly similar molecular property prediction tasks, researchers can define a multi-factorial optimization problem where each factor (task) is known to possess high complementarity with others, thereby increasing the likelihood of beneficial genetic transfer and improving the overall convergence and quality of solutions [12].

The FetterGrad algorithm, developed to mitigate gradient conflicts in deep learning-based MTL [48], also presents a compelling analogy for evolutionary multitasking. The challenge of negative transfer in MTL due to conflicting gradients is analogous to the potential for destructive crossover in EAs when tasks are unrelated. Incorporating a similar "conflict-aware" mechanism into evolutionary operators, perhaps one that measures and minimizes the "evolutionary distance" between potential parent solutions from different tasks, could be a fruitful area for research at the intersection of evolutionary computation and deep learning.

Dynamic Weighting Strategies for Efficient Multi-Source Knowledge Utilization

Evolutionary multitasking (EMT) represents a paradigm shift in computational intelligence, enabling the simultaneous solution of multiple optimization tasks within a single algorithmic run. This approach mirrors the efficiency of natural evolution, which concurrently cultivates organisms adapted to diverse ecological niches. A significant challenge within this framework is the effective utilization of knowledge distilled from multiple source tasks to enhance learning on a target problem. Dynamic weighting strategies have emerged as a critical mechanism to address this challenge, allowing for the adaptive prioritization and integration of knowledge sources based on their evolving relevance and utility. Within the context of evolutionary multitasking neural network training, these strategies facilitate a more efficient and robust search process, preventing the dominance of any single task and promoting synergistic knowledge transfer. This document outlines the application notes and experimental protocols for implementing dynamic weighting, drawing upon recent advancements in evolutionary computation and multi-objective reinforcement learning to guide researchers and drug development professionals.

Application Notes

Dynamic weighting strategies are designed to modulate the influence of different knowledge sources or objectives during the optimization process. Their application is particularly valuable in scenarios involving conflicting tasks or objectives with varying learning dynamics.

Core Principles and Methodological Approaches

The implementation of dynamic weighting is governed by several core principles. The foundational principle involves redirecting learning effort towards objectives with the greatest potential for improvement, thereby optimizing the allocation of computational resources [50]. Two sophisticated methodological approaches have been developed for this purpose:

Hypervolume-Guided Weight Adaptation: This method is applicable when user preferences for different objectives are known or can be specified. It operates by encouraging the evolutionary policy to discover new non-dominated solutions at each training step. The algorithm rewards new checkpoints that demonstrate a positive contribution to the hypervolume of the Pareto front, thereby proactively pushing the front in the desired optimization direction [50]. This ensures that the search process is continuously guided towards regions of the objective space that align with user-defined preferences.
Gradient-Based Weight Optimization: In scenarios where explicit user preferences are unavailable, a gradient-based approach offers a flexible alternative. This method computes the contribution of each objective's gradient to the overall improvement of the model's performance. By analyzing the alignment and magnitude of gradients from different tasks, the algorithm dynamically reallocates weights to balance the learning process [50]. This approach is especially powerful in highly non-convex and non-linear optimization landscapes, such as those encountered in neural network training, where static weighting schemes often fail to capture optimal trade-offs.

Advantages over Static Weighting

The transition from static to dynamic weighting addresses fundamental limitations inherent in traditional multi-objective optimization. Static linear scalarization, which uses fixed weights to combine multiple objectives into a single scalar function, is provably unable to capture solutions residing in non-convex regions of the Pareto front [50]. Furthermore, empirical studies reveal that different objectives possess varying learning difficulties, often leading to premature saturation of some tasks while others continue to improve. Dynamic weighting mitigates this by continuously rebalancing and reprioritizing objectives, facilitating a more thorough exploration of the objective space and enabling the discovery of superior, Pareto-dominant solutions [50].

Experimental Protocols

The following protocols provide a detailed methodology for implementing and evaluating dynamic weighting strategies within an evolutionary multitasking framework for neural network training.

Protocol for Evolutionary Multitasking with Dynamic Weighting

This protocol is adapted from methodologies used in Evolutionary Multitasking for Positive and Unlabeled (PU) learning and dynamic reward weighting in reinforcement learning [29] [50].

1. Problem Formulation and Task Definition:

Define Component Tasks: Clearly delineate the multiple optimization tasks (e.g., T1, T2, ..., Tk). In a drug discovery context, these could involve predicting binding affinity, optimizing solubility, and minimizing toxicity.
Formulate as Multitasking Problem: Construct a multitasking optimization environment where a single population of individuals (e.g., neural networks) is evaluated against all k tasks simultaneously.

2. Algorithm Initialization:

Initialize Population: Create an initial population P of neural networks with random or heuristic-based weights.
Initialize Dynamic Weights: Set initial weight vectors w_i(0) for each task i. These can be uniform (1/k) or based on prior knowledge.
Specify Genetic Operators: Choose appropriate crossover and mutation operators for the evolutionary algorithm.

3. Evolutionary Cycle with Dynamic Weighting: The following process is repeated for each generation until a termination criterion is met (e.g., maximum number of generations or convergence).

Step 3.1: Evaluate Population: For each individual in P, compute its performance (fitness) on all k tasks.
Step 3.2: Calculate Dynamic Weights: Update the weight for each task i for the next generation, w_i(t+1), using one of the following methods:
- Hypervolume-Guided: Calculate the contribution of each task to the hypervolume of the current Pareto front approximation. Increase the weight for tasks whose improvement leads to a larger hypervolume gain [50].
- Gradient-Based: For each individual, compute the gradient of each task's loss. The new weight w_i(t+1) is adjusted based on the norm and direction of these gradients to maximize overall progress [50].
Step 3.3: Compute Composite Fitness: For each individual, aggregate its multi-task performance into a single scalar fitness value using the dynamically updated weights. A common method is weighted sum: Fitness = Î£ [w_i(t) * Fitness_i].
Step 3.4: Select and Reproduce: Apply a selection operator (e.g., tournament selection) based on the composite fitness to choose parents for the next generation.
Step 3.5: Apply Genetic Operators: Create offspring from the selected parents using crossover and mutation.
Step 3.6: Integrate Knowledge Transfer (Optional): Implement an explicit knowledge transfer mechanism, such as the bidirectional transfer used in EMT-PU [29]. For example, allow a percentage of high-fitness individuals from a source task to migrate and influence the population of a target task.

4. Output and Analysis:

Upon termination, the algorithm outputs the final population. The non-dominated solutions from this population represent the Pareto-optimal set of neural networks, offering a range of trade-offs between the k tasks.
The evolution of the dynamic weights w_i(t) over generations should be analyzed to understand the relative importance and learning difficulty of each task throughout the process.

Table 1: Key Parameters for Evolutionary Multitasking Protocol

Parameter	Description	Recommended Value / Range
Population Size (`P`)	Number of individuals in the population	50 - 1000
Maximum Generations	Termination criterion	Problem-dependent
Weight Update Frequency	How often dynamic weights are recalculated	Every generation
Crossover Rate	Probability of applying crossover	0.6 - 0.9
Mutation Rate	Probability of applying mutation	0.01 - 0.1
Knowledge Transfer Rate	Proportion of individuals migrated between tasks	5% - 20%

Benchmarking and Evaluation Protocol

To ensure rigorous validation, the performance of any dynamic weighting strategy must be evaluated against established benchmarks and baselines.

1. Benchmark Selection:

Utilize standardized Multi-Task Optimization (MTO) test suites, such as those proposed for the CEC 2025 competition [12]. These include:
- Multi-Task Single-Objective Optimization (MTSOO) Suite: Contains nine complex problems with two tasks each and ten benchmark problems with fifty tasks each.
- Multi-Task Multi-Objective Optimization (MTMOO) Suite: Contains analogous problems for multi-objective tasks.

2. Experimental Settings:

Independent Runs: Execute the algorithm for a minimum of 30 independent runs per benchmark problem, each with a different random seed [12].
Computational Budget: Define a maximal number of function evaluations (maxFEs) as the termination criterion. For 2-task problems, maxFEs=200,000 is typical; for 50-task problems, maxFEs=5,000,000 is recommended [12].
Parameter Consistency: Keep the algorithmic parameters identical across all benchmark problems within a test suite to prevent over-fitting [12].

3. Data Recording:

Intermediate Results: At predefined checkpoints (k * maxFEs / Z, where Z=100 for 2-task and Z=1000 for 50-task problems), record the algorithm's performance for each component task [12].
Performance Metric: For single-objective tasks, record the Best Function Error Value (BFEV). For multi-objective tasks, record the Inverted Generational Distance (IGD) to measure convergence and diversity towards the true Pareto front [12].
Save data in a structured text file for post-processing.

4. Performance Comparison:

Baselines: Compare the dynamic weighting strategy against state-of-the-art static weighting methods and other EMT algorithms like the Multi-Factorial Evolutionary Algorithm (MFEA).
Overall Ranking: The final ranking is often based on the median performance (BFEV or IGD) across all runs and all component tasks at varying computational budgets [12].

Table 2: Quantitative Metrics for Benchmarking Dynamic Weighting Strategies

Metric	Formula/Description	Interpretation
Best Function Error Value (BFEV)	`BFEV = f(x) - f(x)` where `x` is the global optimum. In practice, the best objective value found is often used directly [12].	Lower values indicate better performance. A value of 0 signifies the global optimum was found.
Inverted Generational Distance (IGD)	$IGD(P,P^)=\frac{1}{</td> <td>P^</td> <td>}\sqrt{\sum<em>{v\in P^}\min</em>{u\in P}d(u,v)^2}$ where `P` is the true Pareto front and `P` is the approximated front.	Lower IGD values indicate better convergence and diversity. An IGD of 0 means the approximated front matches the true front exactly.
Hypervolume (HV)	The volume of the objective space dominated by the approximated Pareto front, bounded by a reference point.	Higher HV values indicate a better and more diverse approximation of the Pareto front.

Workflow Visualization

The following diagram illustrates the core operational workflow of an evolutionary multitasking algorithm incorporating dynamic weighting, as described in the experimental protocol.

Evolutionary Multitasking with Dynamic Weighting Workflow

The Scientist's Toolkit

This section details the essential computational reagents and resources required to implement the dynamic weighting strategies and experimental protocols outlined in this document.

Table 3: Essential Research Reagent Solutions for Evolutionary Multitasking

Item Name	Function / Role	Specification Notes
Multi-Task Benchmark Suites	Standardized problems for algorithm validation and comparison.	CEC 2025 MTSOO and MTMOO test suites [12]. These provide diverse problems with known optima to evaluate performance.
Evolutionary Algorithm Framework	Provides the core infrastructure for population management, selection, and genetic operations.	Frameworks like DEAP (Python) or custom implementations in C++/Julia. Must support multi-objective optimization.
Dynamic Weighting Module	A software component that implements the hypervolume-guided and/or gradient-based weight update rules.	This can be implemented as a separate function or class within the main algorithm. Requires hypervolume calculation libraries (e.g., `pygmo`).
Neural Network Library	Used to represent and train the individuals (brains) within the population.	TensorFlow, PyTorch, or JAX. The library should support automatic differentiation for gradient-based weight optimization.
High-Per Computing (HPC) Resources	Computational power to execute the numerous independent runs required for statistical significance.	Access to cluster or cloud computing is recommended. The 50-task benchmarks require ~5 million function evaluations per run [12].

In the realm of evolutionary multitasking (EMT) for neural network training, the conflict between convergence speed and population diversity represents a fundamental challenge. Premature convergence can stagnate optimization in local minima, while excessive diversity impedes efficient convergence. Evolutionary Multitasking addresses this by solving multiple tasks simultaneously, leveraging knowledge transfer to enhance performance across tasks [51]. This article details practical protocols for balancing these objectives, with a focus on applications relevant to computational drug development.

Core Techniques and Their Applications

Dual-Population and Knowledge Transfer Strategies

EMT for Positive and Unlabeled (PU) Learning (EMT-PU):

Concept: Formulates PU learning as a bi-task optimization problem [29].
Implementation:
- Task Definitions: An auxiliary task (Ta) identifies more positive samples; the original task (To) performs standard PU classification.
- Dual Populations: Two populations (Pa and Po) evolve independently for Ta and To.
- Knowledge Transfer: A bidirectional strategy transfers knowledge between populations. Pa improves individual quality in Po, while Po promotes diversity in Pa [29].
Application: Ideal for drug discovery scenarios with limited labeled positive data (e.g., rare disease patients or novel compound targets).

Dual-Archive Multitask Optimization (DREA-FS):

Concept: Designed for multi-objective feature selection to identify complementary feature subsets [16].
Implementation:
- Dual Archives: An elite archive guides convergence; a diversity archive preserves feature subsets with equivalent performance but different compositions.
- Task Construction: Creates simplified tasks via filter-based and group-based dimensionality reduction.
Application: Enhances interpretability in high-dimensional biomarker discovery or genomic data analysis by providing multiple, equally predictive feature subsets.

Algorithmic Frameworks and Parameter Control

Variable and Segmented Parameter Control:

Dynamic Parameters: Parameters (e.g., Î³ in Zeroing Neural Networks) adjust based on system state or time, improving adaptability and convergence [52].
Segmented Variable-Parameter ZNN: Uses time-dependent parameters (e.g., Î¼1(t), Î¼2(t)) that change at a threshold Î´0, balancing convergence speed and noise robustness [52].

Table 1: Key Algorithmic Frameworks for Convergence-Diversity Balance

Technique	Core Mechanism	Primary Application Context	Key Advantage
EMT-PU [29]	Bidirectional knowledge transfer between two specialized populations.	Positive and Unlabeled Learning (e.g., limited patient data).	Discovers more reliable positives, improving classification with scarce labels.
DREA-FS [16]	Dual-archive strategy (elite and diversity) with dual-perspective task reduction.	Multi-objective Feature Selection (e.g., biomarker identification).	Finds multiple, equally accurate feature subsets, aiding model interpretability.
Variable-Parameter ZNN [52]	Time- or state-dependent tuning of model parameters (e.g., Î³).	Dynamic System Solving (e.g., robotic control, trajectory planning).	Ensures prescribed-time convergence and enhances robustness to disturbances.

Experimental Protocols

Protocol 1: Implementing EMT-PU for Drug-Target Interaction Prediction

Objective: Validate the EMT-PU algorithm on a Positive and Unlabeled learning task, such as predicting novel drug-target interactions where confirmed positive pairs are limited and many potential pairs are unlabeled.

Materials & Dataset:

Dataset: A drug-target interaction matrix with a small set of known interactions (positives) and a large set of unknown pairs (unlabeled) from public databases like DrugBank or STITCH.
Software: Python with evolutionary computation library (e.g., DEAP).

Procedure:

Task Formulation:
- Define the original task (To): A standard PU classification task on the entire dataset.
- Define the auxiliary task (Ta): A task focused on identifying potential positive samples from the unlabeled set.
Population Initialization:
- Initialize population Po for To with random feature weights.
- Initialize population Pa for Ta using a competition-based strategy to ensure high initial quality [29].
Independent Evolution:
- Evolve Po and Pa independently for a generation using a selected evolutionary algorithm (e.g., Genetic Algorithm).
- Evaluate Po using a standard PU classifier's performance.
- Evaluate Pa based on its success in identifying reliable positives from the unlabeled set.
Bidirectional Knowledge Transfer:
- Transfer from Pa to Po: Implement a hybrid update strategy. Select high-performing individuals from Pa and use their genetic material to guide the mutation or crossover of individuals in Po, improving Po's quality.
- Transfer from Po to Pa: Apply a local update strategy using genetic material from Po to increase the diversity of Pa, preventing it from converging too quickly to a single region of the solution space [29].
Iteration and Evaluation:
- Repeat steps 3-4 for a predetermined number of generations.
- Final performance is evaluated by the classification accuracy of the optimized Po population on a held-out test set. Compare against state-of-the-art PU learning methods.

Protocol 2: Multi-Objective Feature Selection with DREA-FS for Biomarker Discovery

Objective: Apply DREA-FS to a high-dimensional transcriptomics dataset (e.g., from The Cancer Genome Atlas - TCGA) to identify a Pareto-optimal set of non-dominated feature subsets (biomarker panels) that balance the number of genes and classification accuracy for a cancer subtype.

Materials & Dataset:

Dataset: A gene expression dataset with hundreds to thousands of features and labeled disease states.
Software: MATLAB or Python with multi-objective evolutionary algorithm capabilities.

Procedure:

Task Construction via Dimensionality Reduction:
- Task A (Filter-based): Create a simplified task using an improved filter method (e.g., mutual information) to select a subset of top-ranked features.
- Task B (Group-based): Create a complementary task by clustering features into groups and selecting representative features from each group [16].
Dual-Archive Optimization:
- Initialize a population for each task.
- Elite Archive: During evolution, non-dominated feature subsets from both tasks are stored here. This archive provides convergence guidance.
- Diversity Archive: This archive specifically stores and maintains feature subsets that have identical or very similar objective values (e.g., same accuracy and feature subset size) but consist of different features. This preserves multimodal solutions [16].
Multitask Evolution with Knowledge Transfer:
- Evolve the populations for both tasks in parallel.
- Allow for knowledge transfer (e.g., through crossover) between individuals of Task A and Task B, facilitated by the dual-archive structure. The elite archive guides the search towards the Pareto front, while the diversity archive injects varied genetic material to maintain diversity.
Evaluation:
- After convergence, the output is the non-dominated solution set from the elite archive.
- Evaluate the hypervolume and diversity of the obtained Pareto front against other multi-objective feature selection algorithms.
- Analyze the different gene sets in the diversity archive that yield equivalent predictive performance to provide biological insights and alternative biomarker panels.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Research Reagents and Computational Tools

Item Name	Function/Benefit	Example Context / Note
Evolutionary Multitasking Framework (e.g., EMT-PU)	Solves related tasks concurrently via knowledge transfer. Mitigates data scarcity.	PU Learning in drug-target interaction prediction [29].
Dual-Archive Mechanism	Separately manages convergence pressure and solution diversity.	Finding equivalent biomarker sets in DREA-FS [16].
Variable & Segmented Parameters	Enables adaptive tuning of convergence dynamics in real-time.	Predefined-time convergence in ZNNs for robotic control [52].
Bidirectional Knowledge Transfer	Allows for balanced improvement in quality and diversity between tasks.	Core component of the EMT-PU algorithm [29].
Dual-Perspective Reduction (Filter/Group)	Constructs simplified, complementary search spaces for complex problems.	Initial step in the DREA-FS methodology [16].

Workflow and Signaling Diagrams

Diagram 1: EMT-PU Experimental Workflow. This diagram outlines the protocol for implementing Evolutionary Multitasking for Positive and Unlabeled Learning, highlighting the parallel evolution of two tasks and their bidirectional knowledge transfer.

Diagram 2: DREA-FS Dual-Archive Optimization Logic. This diagram illustrates the flow of information and solutions between the two simplified tasks and the dual-archive system, which collaboratively balances convergence and diversity.

Benchmarking Performance: Rigorous Validation Against State-of-the-Art Methods

Standardized benchmarking provides the critical foundation for comparing algorithmic performance, driving scientific progress, and ensuring reproducible research in evolutionary computation. For the specialized domain of evolutionary multitasking, where solvers simultaneously address multiple optimization problems, rigorous benchmarking becomes particularly essential due to the complex interactions between tasks. The CEC 2025 Competition on Evolutionary Multi-task Optimization establishes comprehensive protocols specifically designed to address these complexities, creating a common ground for evaluating how effectively algorithms can transfer knowledge between tasks while preventing negative interference [12]. These standardized approaches enable meaningful comparisons between different multi-task optimization strategies and provide insights into their fundamental operational mechanisms.

The critical importance of such standardization is underscored by recent analyses revealing significant gaps in current benchmarking practices. Widely used synthetic benchmark suites often poorly reflect real-world problem structures, constraints, and information limitations, potentially leading to biased algorithm development and performance claims that fail to translate to practical applications [53]. The CEC 2025 competition protocols directly address these concerns by providing carefully designed test suites with controlled degrees of latent synergy between component tasks, enabling systematic evaluation of knowledge transfer capabilities in evolutionary multitasking [12].

Competition Benchmark Suites and Problem Formulations

The CEC 2025 competition formalizes two distinct but complementary benchmarking tracks, each with specialized test suites designed to probe different aspects of evolutionary multitasking capabilities. These suites enable rigorous evaluation of algorithmic performance across diverse problem characteristics and task relationships.

Table 1: CEC 2025 Competition Test Suite Overview

Test Suite	Problem Type	Number of Problems	Tasks per Problem	Key Performance Metric
MTSOO	Single-Objective	9 complex problems + 10 benchmark problems	2 (complex), 50 (benchmark)	Best Function Error Value (BFEV)
MTMOO	Multi-Objective	9 complex problems + 10 benchmark problems	2 (complex), 50 (benchmark)	Inverted Generational Distance (IGD)

Multi-Task Single-Objective Optimization (MTSOO) Test Suite

The MTSOO suite contains nineteen distinct benchmark problems specifically designed to evaluate single-objective continuous optimization in multitasking environments. Nine complex problems each consist of two single-objective continuous optimization tasks, while ten additional benchmark problems each contain fifty distinct single-objective tasks [12]. This hierarchical structure enables researchers to evaluate algorithm performance across different scales of multitasking, from paired task combinations to massive multi-task environments.

The component tasks within these problems exhibit controlled levels of commonality and complementarity in terms of global optimum locations and fitness landscape characteristics. This deliberate design allows for systematic investigation of how different types of relationships between tasks impact knowledge transfer effectiveness and overall algorithmic performance [12]. Each problem possesses different degrees of latent synergy between component tasks, enabling detailed analysis of which algorithmic strategies work best for specific types of task relationships.

Multi-Task Multi-Objective Optimization (MTMOO) Test Suite

The MTMOO suite extends the multitasking paradigm to multi-objective optimization, containing nineteen problems with similar structure to the MTSOO suite. Nine complex problems each consist of two multi-objective continuous optimization tasks, while ten benchmark problems each contain fifty multi-objective tasks [12]. This suite enables evaluation of how algorithms balance multiple competing objectives within each task while simultaneously transferring knowledge across tasks.

The multi-objective tasks feature controlled variation in their Pareto optimal solutions and fitness landscape characteristics, creating opportunities for knowledge transfer about Pareto front structures and shapes across related tasks. The problems are designed with varying degrees of latent synergy between component tasks, allowing researchers to investigate how multi-objective multitasking algorithms perform under different relationship scenarios [12].

Experimental Protocols and Evaluation Methodologies

The CEC 2025 competition establishes rigorous, standardized experimental protocols designed to ensure fair comparison, statistical significance, and reproducible results across all participating algorithms.

Table 2: Experimental Settings for CEC 2025 Competition

Parameter	MTSOO Settings	MTMOO Settings
Independent Runs	30 per problem	30 per problem
Random Seeds	Different seeds for each run	Different seeds for each run
Max FEs (2-task)	200,000	200,000
Max FEs (50-task)	5,000,000	5,000,000
Checkpoints (Z)	100 (2-task), 1000 (50-task)	100 (2-task), 1000 (50-task)
Performance Metric	Best Function Error Value (BFEV)	Inverted Generational Distance (IGD)

Execution and Data Collection Protocols

For each benchmark problem, algorithms must be executed for thirty independent runs employing different random seeds for pseudo-random number generators. The competition explicitly prohibits executing multiple sets of thirty runs and selectively reporting the best-performing set, ensuring unbiased performance assessment [12]. This rigorous approach ensures that reported results capture typical algorithmic performance rather than exceptional cases.

The competition employs distinct termination criteria based on problem complexity. For all 2-task benchmark problems, the maximum number of function evaluations (maxFEs) is set to 200,000, while for 50-task problems, this increases to 5,000,000 [12]. In the multitasking context, one function evaluation refers to calculating the objective function value of any component task without distinguishing between different tasks, creating a uniform computational budget measure across different multitasking scenarios.

Performance Recording and Intermediate Results

Competition protocols require detailed recording of intermediate results at predefined computational checkpoints to enable thorough analysis of algorithmic convergence behavior. For the MTSOO track, the best function error value (BFEV) for each component task must be recorded when the number of function evaluations reaches kÃ—maxFEs/Z, where k ranges from 1 to Z [12]. For 2-task problems, Z=100, resulting in 100 checkpoints, while for 50-task problems, Z=1000, resulting in 1000 checkpoints.

For the MTMOO track, the inverted generational distance (IGD) values for each component task must be recorded at the same computational checkpoints [12]. IGD provides a comprehensive measure of convergence and diversity for multi-objective optimization by calculating the distance between solutions found by the algorithm and the true Pareto front. All intermediate results must be saved in specifically formatted text files for automated evaluation and comparison.

The competition employs a sophisticated overall ranking criterion that considers algorithmic performance across all component tasks under varying computational budgets. Each component task in each benchmark problem is treated as an individual task, resulting in a total of 518 individual tasks for comprehensive evaluation [12]. For each algorithm, the median performance value (BFEV for MTSOO, IGD for MTMOO) over thirty runs is calculated at each checkpoint for every task.

To prevent deliberate algorithm calibration that specifically targets the ranking criterion, the precise mathematical formulation of the overall ranking criterion is not released until after the competition submission deadline [12]. This approach encourages development of generally robust multitasking algorithms rather than specialized solutions overly tuned to a specific evaluation metric.

Experimental Workflow and Benchmarking Process

The following diagram illustrates the complete experimental workflow prescribed by the CEC 2025 competition protocols, from problem selection to final performance evaluation:

Implementation Protocol for Multi-Task Single-Objective Optimization

For researchers implementing the MTSOO benchmarking protocol, the following detailed workflow ensures compliance with competition standards:

Successful implementation of the CEC 2025 benchmarking protocols requires specific computational tools and resources. The following table details the essential components of the benchmarking toolkit:

Table 3: Essential Research Reagents and Resources for Evolutionary Multitasking Benchmarking

Tool/Resource	Function/Purpose	Implementation Notes
Benchmark Problem Code	Provides standardized problem definitions	Downloaded from competition website [12]
Reference Algorithm Implementations	Baseline for performance comparison	MFEA provided as reference [12]
Performance Evaluation Scripts	Automated calculation of metrics	Custom implementation following competition specs
Statistical Analysis Framework	Comparison of results across runs	Recommended: 30 independent runs with different seeds [12]
Data Formatting Tools	Preparation of results for submission	Generates specifically formatted text files

Application to Evolutionary Multitasking Neural Network Training

The CEC 2025 benchmarking protocols provide an exemplary framework for evaluating evolutionary multitasking approaches to neural network training and architecture search. Recent advances in neuroevolutionary methods demonstrate the growing importance of multi-task optimization in deep learning, particularly for architecture search, hyperparameter optimization, and multi-task learning scenarios [5] [54] [55]. By applying the rigorous evaluation methodology outlined in the competition, researchers can obtain reliable, comparable results for neuroevolutionary algorithms across diverse neural architecture search benchmarks.

The competition's focus on knowledge transfer between related tasks directly aligns with central challenges in neural network research, where architectures and trained parameters from one task often provide valuable starting points for related tasks. Recent work on evolutionary bi-level neural architecture search demonstrates how multitasking principles can simultaneously optimize network architecture, weights, and biases using bi-level optimization strategies [5]. The CEC 2025 protocols provide the standardized evaluation framework needed to compare such approaches against traditional neural network training methods and other evolutionary strategies.

Furthermore, the competition's requirement for fixed algorithm parameters across all problems mirrors the practical need for robust neural architecture search methods that perform well across diverse datasets and application domains without extensive per-problem tuning. This constraint encourages development of generally effective neuroevolutionary methods rather than overly specialized solutions, potentially leading to more widely applicable neural network design automation [55].

Within the rapidly advancing field of artificial intelligence, Evolutionary Multitasking Neural Networks (EMT-NNs) represent a powerful paradigm that leverages knowledge transfer across related tasks to enhance learning efficiency and performance. The principal challenge in this domain lies in the rigorous and standardized evaluation of these algorithms. This application note provides a structured framework for assessing EMT-NNs by delineating key performance metrics, detailed experimental protocols, and essential research tools. Focusing on accuracy, convergence speed, and robustness, this guide aims to equip researchers with the methodologies necessary for comprehensive analysis and valid comparison of different multitasking strategies in evolutionary computation.

Core Performance Metrics for Evolutionary Multitasking

Evaluating Evolutionary Multitasking (EMT) algorithms requires a multi-faceted approach that captures not only the final solution quality but also the efficiency and stability of the optimization process. The following table summarizes the core metrics across the three primary dimensions of performance [56] [57] [58].

Table 1: Key Performance Metrics for Evolutionary Multitasking

Metric Category	Metric Name	Mathematical Formulation / Definition	Interpretation in EMT Context
Accuracy & Solution Quality	Multitask Accuracy (MTA)	For classification: ( \frac{\text{Correct Predictions across all tasks}}{\text{Total Predictations}} ) [58]	Measures overall correctness in classification-based MTO problems.
	Hypervolume (HV)	Volume of objective space dominated by the obtained Pareto front [57]	Quantifies convergence and diversity in multi-objective multitask optimization.
	Average Best Fitness (ABF)	( \frac{1}{K} \sum{k=1}^{K} fk^{best} ) where ( K ) is the number of tasks [56]	Tracks the average quality of the best-found solution for each task.
Convergence Speed	Convergence Curve	Plot of best fitness value versus function evaluations (FEs) or generations [56] [57]	Visualizes the pace of performance improvement; steeper curves indicate faster convergence.
	Number of Function Evaluations to Target (NFE-T)	The count of FEs required to reach a pre-defined target fitness value.	A lower NFE-T indicates higher optimization efficiency and faster knowledge transfer.
	Effective Dimensionality Growth	Monitoring the expansion of a network's representational capacity during training [59]	Faster expansion in early training can indicate rapid feature formation and learning.
Robustness & Stability	Positive Transfer Rate (PTR)	The frequency with which cross-task knowledge transfer leads to performance improvement [56]	A higher PTR indicates more effective and beneficial knowledge sharing.
	Negative Transfer Incidence (NTI)	The frequency or impact of performance degradation due to inter-task transfer [56] [57]	A lower NTI signifies better management of dissimilar tasks and robust transfer policies.
	Performance Standard Deviation	( \sigma = \sqrt{\frac{1}{N-1} \sum{i=1}^{N} (xi - \mu)^2} ) over multiple runs	A lower standard deviation in final performance indicates higher algorithmic stability.

Experimental Protocols for Metric Evaluation

Protocol for Benchmarking Accuracy and Convergence

This protocol outlines the steps for evaluating the core performance of an EMT algorithm on standardized test suites.

Objective: To quantitatively assess the accuracy and convergence speed of an EMT algorithm against baseline methods. Materials: Standard Multitask Optimization Benchmark Suite (e.g., CEC2017) [56], computing cluster node. Procedure:

Experimental Setup: Select a set of related benchmark tasks (K). Configure the EMT algorithm and baseline algorithms (e.g., MFEA, MOMFEA) with controlled population sizes and maximum function evaluations (FEs) [56] [57].
Algorithm Execution: For each algorithm, execute a minimum of 30 independent runs to account for stochasticity. Per run, log the best fitness for each task at fixed intervals (e.g., every 100 FEs).
Data Collection: For each run, record:
- Final Average Best Fitness (ABF) for each task.
- The Number of Function Evaluations to Target (NFE-T) for a pre-set target fitness.
- The entire Convergence Curve data.
Post-Processing & Analysis:
- Calculate the mean and standard deviation of ABF and NFE-T across all runs.
- Perform statistical significance tests (e.g., Wilcoxon signed-rank test) to compare the algorithm's results with baselines.
- Plot average convergence curves for visual comparison of convergence speed [57].

Protocol for Quantifying Knowledge Transfer Robustness

This protocol is designed to measure the effectiveness and safety of inter-task knowledge transfer, a critical aspect of EMT.

Objective: To measure the Positive Transfer Rate (PTR) and Negative Transfer Incidence (NTI) within an EMT algorithm. Materials: A multi-task problem set with known or quantifiable inter-task similarities. Procedure:

Transfer Tracking: Instrument the EMT algorithm's code to log all inter-task knowledge transfer events (e.g., solution migrations from a source task to a target task) [56].
Impact Assessment: For each transfer event occurring at generation g, compare the fitness of the target task before the transfer (at g) and after assimilation (at g+1).
Event Classification: Categorize each transfer event:
- Positive Transfer: Fitness of the target task improves.
- Negative Transfer: Fitness of the target task degrades.
- Neutral Transfer: No significant change in fitness.
Metric Calculation: After a complete run, calculate:
- PTR = (Number of Positive Transfers) / (Total Transfers)
- NTI = (Number of Negative Transfers) / (Total Transfers)
Validation: Correlate high PTR and low NTI with tasks known to have high similarity, and vice-versa, to validate the metric's sensibility [56].

Protocol for Analyzing Representational Dynamics

Inspired by recent findings on neural network training dynamics, this protocol investigates how the internal representations of an EMT model evolve.

Objective: To track the expansion of representational capacity during the training of an EMT neural network. Materials: An EMT-NN model, high-frequency checkpointing tool (e.g., ndtracker [59]). Procedure:

High-Frequency Checkpointing: Configure the training loop to save model state checkpoints at a high frequency (e.g., every 5-10 steps) instead of the conventional every 100 or 500 steps [59].
Dimensionality Calculation: At each checkpoint, compute the Effective Dimensionality of the model's activations for a fixed batch of data. This can be done via PCA on the activation matrices [59].
Phase Mapping: Plot the effective dimensionality against the training step. Identify key phases:
- Initial Collapse (0-300 steps): Dimensionality drops as the network restructures from random initialization.
- Expansion (300-5,000 steps): Dimensionality increases as the network builds new representational structures.
- Stabilization (5,000+ steps): Growth plateaus as architectural constraints bind [59].
Interpretation: Correlate the timing and magnitude of dimensionality "jumps" with performance improvements on the multitask loss. Faster, structured expansion may indicate more efficient learning of shared representations.

Workflow and Signaling Visualization

The following diagram illustrates the integrated experimental workflow for the comprehensive evaluation of an Evolutionary Multitasking system, incorporating the protocols defined above.

Diagram: Integrated Workflow for EMT Performance Evaluation. This diagram outlines the three-phase process for a comprehensive evaluation, from algorithm execution to final synthesis.

The core of many modern EMT algorithms, particularly those using neural network representations, involves a learned knowledge transfer policy. The diagram below models this process as a multi-role reinforcement learning system, addressing the fundamental questions of "where, what, and how" to transfer.

Diagram: Multi-Role RL System for Knowledge Transfer. This diagram visualizes a coordinated RL policy where specialized agents handle different aspects of the transfer decision, a key mechanism in advanced EMT like MetaMTO [56].

The Scientist's Toolkit: Research Reagent Solutions

This section details the essential computational "reagents" and tools required to conduct rigorous experiments in evolutionary multitasking.

Table 2: Essential Research Tools for Evolutionary Multitasking Experiments

Tool / Solution Name	Category / Type	Primary Function in Research
CEC2017/WCCI2020 Test Suite [56]	Benchmark Problems	Provides a standardized set of multitask optimization problems for fair algorithm comparison and validation.
MetaMTO Framework [56]	Algorithmic Framework	A meta-reinforcement learning framework for learning generalizable knowledge transfer policies in EMT.
Neural Dimensionality Tracker (NDT) [59]	Analysis Library	Enables high-resolution tracking of effective representational dimensionality during neural network training.
EMM-DEMS Algorithm [57]	Algorithm Implementation	A multiobjective multitask evolutionary algorithm using hybrid differential evolution for generating high-quality solutions.
Multi-Role RL Policy [56]	Transfer Control Policy	A learned policy comprising Task Routing, Knowledge Control, and Strategy Adaptation agents to automate transfer decisions.
Hybrid Differential Evolution (HDE) [57]	Search Operator	An offspring generation strategy that mixes mutation operators to balance global exploration and local exploitation.

Accurately predicting drug-target interactions (DTIs) is a critical challenge in computational drug discovery, with the potential to significantly reduce the decade-long, multi-billion dollar drug development process [60]. While recent advances in deep learning have produced models with impressive benchmark performance, the true test of their value lies in their validation within practical, real-world contexts. This Application Note examines the performance of state-of-the-art DTI prediction methods, with a specific focus on how evolutionary multitasking principles can enhance model generalization and utility in translational research settings. We present structured quantitative comparisons, detailed experimental protocols, and essential research tools to empower researchers in implementing and validating these approaches.

Performance Benchmarks: Quantitative Comparison of State-of-the-Art Methods

Recent studies demonstrate significant advancements in DTI prediction capabilities, with several frameworks achieving exceptional performance on benchmark datasets. The table below summarizes the key performance metrics reported in recent high-performing studies.

Table 1: Performance benchmarks of recent DTI prediction models on public datasets

Model Name	Core Methodology	AUROC	AUPR	Key Advantages	Experimental Validation
Hetero-KGraphDTI [60]	Graph Neural Networks with Knowledge-Based Regularization	0.98	0.89	Integrates biomedical ontologies; interpretable attention weights	High proportion of novel DTI predictions confirmed experimentally
MVPA-DTI [61]	Heterogeneous Network with Multiview Path Aggregation	0.966	0.901	Molecular Attention Transformer for 3D drug features; Prot-T5 for protein sequences	38/53 candidate drugs predicted to interact with KCNH2 target (10 clinically used)
GRAM-DTI [62]	Adaptive Multimodal Representation Learning	Outperforms baselines across 4 datasets	-	Higher-order multimodal alignment; adaptive modality dropout	-
DHGT-DTI [63]	Dual-view Heterogeneous Network with GraphSAGE & Graph Transformer	-	-	Captures both local and global network structures	Case studies on 6 Parkinson's disease drugs

The consistently high AUROC (Area Under the Receiver Operating Characteristic Curve) and AUPR (Area Under the Precision-Recall Curve) scores across these diverse methodologies indicate substantial progress in the field's ability to accurately predict DTIs. Particularly noteworthy is the performance of Hetero-KGraphDTI, which achieves an average AUROC of 0.98 and AUPR of 0.89, surpassing existing state-of-the-art methods by a considerable margin [60].

Experimental Protocols for Practical Validation

Computational Prediction Protocol

Purpose: To provide a standardized methodology for implementing evolutionary multitasking-inspired DTI prediction using heterogeneous graph neural networks.

Materials: Drug chemical structures (SMILES/InChI), protein sequences (FASTA format), known DTIs (e.g., from DrugBank, BindingDB), biomedical ontologies (Gene Ontology, ChEBI), computational resources (GPU cluster recommended).

Procedure:

Data Curation and Integration
- Compile drug molecules and their structural descriptors (molecular fingerprints, graph representations)
- Collect target protein sequences and extract evolutionary features (PSI-BLAST, sequence embeddings)
- Integrate heterogeneous biological networks (drug-drug similarities, protein-protein interactions, disease associations)
- Annotate entities with biomedical ontology terms from Gene Ontology and DrugBank [60]

Evolutionary Multitasking Framework Setup
- Define primary task: Standard DTI classification
- Establish auxiliary task: Identification of additional reliable positive samples from unlabeled data [29]
- Implement bidirectional knowledge transfer mechanism between tasks
- Configure competition-based initialization for auxiliary task population [29]
Model Architecture Configuration
- Implement graph convolutional encoder with multi-layer message passing
- Incorporate attention mechanisms to weight edge importance
- Add knowledge-aware regularization to enforce biological plausibility [60]
- Design meta-path aggregation for heterogeneous networks [61]
Training with Adaptive Sampling
- Apply enhanced negative sampling strategy to address class imbalance [60]
- Implement adaptive modality dropout to handle varying modality informativeness [62]
- Utilize volume-based contrastive learning for multimodal alignment [62]
- Incorporate IC50 activity measurements as weak supervision when available [62]
Model Interpretation and Analysis
- Visualize attention weights to identify salient molecular substructures and protein motifs [60]
- Extract meta-path importance scores for biological insight [63]
- Analyze learned embeddings for functional clustering

Experimental Validation Protocol

Purpose: To experimentally confirm computationally predicted novel DTIs in a real-world drug discovery context.

Materials: Predicted drug-target pairs, appropriate cell lines, assay reagents, control compounds (known inhibitors/activators), laboratory equipment for chosen assay type.

Procedure:

Candidate Prioritization
- Rank predicted DTIs by interaction scores and biological plausibility
- Filter against known interactions in public databases
- Apply structural clustering to ensure chemical diversity
- Consider drug repurposing potential (approved drugs prioritized)

In Vitro Binding Assays
- Select appropriate assay format (SPR, FRET, TR-FRET, etc.) based on target class
- Express and purify recombinant target protein
- Source candidate compounds (commercial sources or custom synthesis)
- Perform concentration-response experiments to determine binding affinity
- Include appropriate positive and negative controls
Functional Activity Assessment
- Implement cell-based functional assays relevant to target biology
- Measure downstream signaling or phenotypic changes
- Determine potency (EC50/IC50) and efficacy (% maximum response)
- Assess selectivity through counter-screening against related targets
Validation in Disease-Relevant Models
- Evaluate confirmed hits in disease-specific cellular models
- Assess target engagement using cellular thermal shift assays (CETSA) or similar
- Proceed to in vivo validation for top candidates

Table 2: Key research reagent solutions for DTI prediction and validation

Resource Category	Specific Examples	Function and Application
Bioinformatics Databases	DrugBank, BindingDB, ChEMBL, PubChem	Source of known DTIs, compound structures, bioactivity data
Protein Resources	UniProt, PDB, AlphaFold DB	Protein sequences, structures, and functional annotations
Chemical Information	PubChem, ZINC, ChEMBL	Drug-like compounds for screening, structural descriptors
Omics Data Repositories	GEO, TCGA, GTEx	Disease context, expression patterns, pathway information
Biomedical Ontologies	Gene Ontology, ChEBI, MONDO	Semantic knowledge integration, biological reasoning
Software Frameworks	PyTorch Geometric, Deep Graph Library, RDKit	Graph neural network implementation, cheminformatics
Experimental Assay Kits	LanthaScreen, Tag-lite, SPR platforms	High-throughput binding and functional assays

Workflow Visualization: From Prediction to Validation

Figure 1: Integrated computational and experimental workflow for DTI prediction and validation. The diagram illustrates the flow from multimodal data integration through evolutionary multitasking optimization to experimental confirmation of predicted interactions.

The integration of evolutionary multitasking principles with modern graph representation learning has significantly advanced the state of DTI prediction, bridging the gap between computational models and practical drug discovery applications. The protocols and resources presented herein provide researchers with a comprehensive framework for implementing these approaches, with demonstrated success in real-world validation studies. As these methods continue to evolve, their ability to leverage heterogeneous biological knowledge while addressing fundamental challenges like label uncertainty will further accelerate the identification of novel therapeutic opportunities.

The pursuit of artificial intelligence (AI) systems capable of human-like multitasking represents a fundamental challenge and opportunity within computational intelligence. Unlike humans, who face considerable switching costs when interleaving problems, machines can fluidly transition between tasks and, crucially, transfer problem-solving knowledge among them [12]. Evolutionary Multitask Optimization (EMTO) has emerged as a powerful paradigm that operationalizes this principle, enabling simultaneous solutions to multiple optimization problems by harnessing their underlying synergies [10]. Within the demanding context of large-scale scientific domains like drug development, where optimization problems are both computationally expensive and numerous, the computational efficiency of EMTO becomes paramount [64]. This analysis examines the cost-benefit calculus of evolutionary multitasking in large-scale scenarios, quantifying its efficiency gains and establishing rigorous protocols for its application in research and industry.

Core Concepts and Quantitative Landscape

Evolutionary Multitask Optimization (EMTO) is founded on the principle that concurrently solving multiple optimization tasks can be more efficient than tackling them in isolation, provided there exists latent similarity or complementarity between the tasks' fitness landscapes [10]. This approach is inspired by natural evolution, which simultaneously produces organisms skilled at surviving in diverse ecological niches, with genetic material evolved for one task often proving effective for another [12].

In practice, EMTO algorithms, such as the Multi-Factorial Evolutionary Algorithm (MFEA), maintain a unified population of individuals that are decoded and evaluated in the context of different tasks. Knowledge transfer is facilitated through specialized genetic operators, allowing discoveries in one task to inform and accelerate progress in others [10]. The efficacy of this paradigm is critically dependent on several mechanisms, including the dynamic calibration of knowledge transfer probability, the accurate selection of similar tasks for migration, and the mitigation of negative transfer through strategies like anomaly detection [10].

The computational expense of real-world problems, such as those in drug development, underscores the value of EMTO. These are often Expensive Multitasking Optimization Problems (EMTOPs), where a single function evaluationâ€”a simulation or physical experimentâ€”can take hours or even days [64]. In such contexts, the ability of EMTO to reduce the total number of required evaluations through inter-task knowledge transfer offers significant potential for resource savings and acceleration of research timelines.

Table 1: Computational Cost Spectrum of Model Training (Adapted from [65])

Model Type	Estimated Cost (USD)	Training Time	Hardware Requirements
Small CNN (Image Classification)	$50 - $200	2 - 8 hours	Consumer GPU
Medium Transformer (Text Processing)	$1,000 - $5,000	1 - 3 days	Cloud GPUs
Large Language Model	$100,000 - $1,000,000+	Weeks to Months	Distributed GPU Clusters
State-of-the-Art Models (e.g., Gemini Ultra)	Up to $191 million	Extensive	Massive Distributed Infrastructure

The Scientist's Toolkit: Key Research Reagents & Frameworks

Successful implementation of evolutionary multitasking research requires a suite of software frameworks and algorithmic components. The selection of an appropriate deep learning framework is often the first critical decision, as it forms the foundation for building and training neural network models [18].

Table 2: Essential Research Reagents for Evolutionary Multitasking

Category	Item	Function & Application
Core AI Frameworks	PyTorch [18] [66]	A flexible, Pythonic framework with dynamic computation graphs, ideal for research prototyping and rapid experimentation.
	TensorFlow [18] [67]	A highly scalable, production-ready framework with strong deployment tools (e.g., TensorFlow Lite, TensorFlow Serving).
	JAX [18]	A high-performance framework for scientific computing, combining a NumPy-like API with automatic differentiation and hardware acceleration.
Specialized Libraries	Hugging Face Transformers [18] [66]	Provides thousands of pre-trained models (e.g., BERT, GPT) for NLP and beyond, simplifying transfer learning and fine-tuning.
	DeepSpeed [18]	An optimization library from Microsoft that enables efficient training of extremely large models via memory optimization and 3D parallelism.
Algorithmic Components	CMA-ES [64]	A robust evolutionary strategy for continuous optimization, often used as a core solver within surrogate-assisted EMTO.
	Support Vector Classifier (SVC) [64]	Used in classifier-assisted EMTO to prescreen candidate solutions, reducing the need for expensive function evaluations.
Benchmarking Resources	CEC 2025 MTO Test Suites [12]	Standardized benchmark problems for Multi-Task Single-Objective and Multi-Task Multi-Objective Optimization for performance evaluation.

Quantifying Efficiency: Data from Advanced Algorithms

Recent algorithmic advances demonstrate the tangible efficiency gains achievable through sophisticated EMTO methods. The performance of these algorithms is typically measured by their convergence speed and the final solution quality achieved under a limited computational budget (e.g., a maximum number of function evaluations).

The MGAD (Multiple similar sources and anomaly Detection) algorithm addresses key challenges in EMTO, such as dynamic process control and negative knowledge transfer. It employs an enhanced adaptive knowledge transfer probability strategy and an anomaly detection-based transfer mechanism. In comparative experiments, MGAD demonstrated "strong competitiveness in convergence speed and optimization ability" compared to other state-of-the-art algorithms [10].

For expensive optimization problems, the Classifier-Assisted Evolutionary Multitasking Optimization algorithm (CA-MTO) offers a distinct efficiency advantage. By using a Support Vector Classifier (SVC) as a surrogate to prescreen solutions, it drastically reduces the number of costly function evaluations. Integrated with the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), this approach shows "significant superiority over general CMA-ES in terms of both robustness and scalability." Furthermore, its knowledge transfer strategy, which enriches training samples for each task's classifier by sharing high-quality solutions across tasks, provides an additional "competitive edge over some state-of-the-art algorithms on expensive multitasking optimization problems" [64].

In the related field of Multi-Task Learning (MTL) for deep learning, a key insight reveals that optimization imbalance is strongly correlated with the norm of task-specific gradients. A straightforward strategy that scales task losses according to their gradient norms can achieve performance comparable to an extensive and computationally expensive grid search for optimal weights, representing a significant reduction in tuning costs [68].

Evolutionary Multitasking Optimization Workflow

Detailed Experimental Protocols

Protocol 1: Benchmarking EMTO Algorithm Performance

This protocol outlines the standardized procedure for evaluating the performance and computational efficiency of EMTO algorithms using established benchmark suites, as defined by the CEC 2025 competition guidelines [12].

1. Experimental Setup & Resource Allocation

Benchmark Selection: Utilize the test suites from the CEC 2025 Competition on Evolutionary Multi-task Optimization. This includes:
- The Multi-Task Single-Objective Optimization (MTSOO) suite, containing nine 2-task problems and ten 50-task problems.
- The Multi-Task Multi-Objective Optimization (MTMOO) suite, with a similar structure.
Computational Budget: For 2-task benchmark problems, set the maximal number of function evaluations (maxFEs) to 200,000 per run. For 50-task benchmark problems, set maxFEs to 5,000,000 per run. One function evaluation is counted for the calculation of any component task's objective function.
Statistical Rigor: Execute 30 independent runs of the algorithm per benchmark problem. Each run must employ a different random seed. It is prohibited to execute multiple sets of 30 runs and selectively report the best one.
Parameter Configuration: The parameter settings of the algorithm must remain identical for all benchmark problems within a test suite (MTSOO or MTMOO). All parameter settings must be fully reported in the final submission.

2. Data Acquisition & Performance Recording

Intermediate Checkpoints: During execution, record the best function error value (BFEV) for each component task when the number of function evaluations reaches predefined checkpoints. For 2-task problems, use Z=100 checkpoints (k*maxFEs/100 for k=1 to 100). For 50-task problems, use Z=1000 checkpoints.
Data Logging: Save intermediate results into separate .txt files for each benchmark problem. The file should be structured with the first column containing the function evaluation count at each checkpoint, followed by columns for the BFEV for each task across all 30 runs.
Final Performance Calculation: After all runs are complete, calculate the median BFEV over the 30 runs at each checkpoint for every individual task. This data forms the basis for the overall ranking criterion, the precise formulation of which is defined by the competition organizers.

3. Analysis & Interpretation

Convergence Speed: Plot the median BFEV against the number of function evaluations to visualize and compare the convergence rate of different algorithms.
Solution Quality: Compare the final median BFEV values achieved at maxFEs to assess the optimization precision of the algorithms.
Algorithm Ranking: Apply the official competition ranking criterion, which considers algorithm performance on each component task across all computational budgets, to obtain an overall efficiency score.

Protocol 2: Implementing Classifier-Assisted EMTO for Expensive Problems

This protocol details the methodology for applying a classifier-assisted approach (e.g., CA-MTO [64]) to solve expensive multitasking problems, where surrogate models are used to reduce computational costs.

1. Problem Formulation & Algorithm Selection

Problem Identification: Define the set of K computationally expensive optimization tasks to be solved simultaneously. These are characterized by objective functions that require minutes to hours to evaluate (e.g., complex simulations).
Base Solver: Select a robust evolutionary algorithm as the core optimizer, such as the Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
Surrogate Model: Choose a classification model to act as the surrogate. The Support Vector Classifier (SVC) is a suitable candidate due to its efficiency and robustness.

2. System Initialization & Training

Initial Sampling: For each task, generate an initial small population of solutions and evaluate them using the expensive true objective function. This creates a labeled dataset (solution, fitness) for each task.
Classifier Training: Train a separate SVC for each task using its initial dataset. The classifier learns to predict whether a candidate solution is better or worse than a reference point (e.g., the parent), effectively modeling the direction of improvement rather than the exact fitness value.

3. Knowledge Transfer & Evolutionary Loop

Subspace Alignment: Implement a knowledge transfer strategy based on Principal Component Analysis (PCA). For each task, create a low-dimensional subspace from its current population of high-quality solutions.
Sample Aggregation: Learn an alignment matrix to transform and aggregate labeled samples from all related tasks into a unified, enriched dataset for each task-specific classifier.
Classifier-Assisted Evolution:
- Prescreening: For each task, use its knowledge-augmented SVC to prescreen newly generated offspring solutions. This predicts which offspring are promising without invoking the expensive function evaluator.
- Selective Evaluation: Only the offspring deemed promising by the classifier are evaluated with the true, expensive objective function.
- Database Update: Add the newly evaluated (solution, fitness) pairs to the training datasets for all tasks, and periodically retrain the SVC models to improve their accuracy.

4. Validation & Stopping Criteria

Performance Monitoring: Track the best-found solution for each task over generations. The algorithm is terminated when a predefined computational budget (e.g., a maximum number of true function evaluations) is exhausted, or when performance plateaus.
Result Reporting: The final output is the set of best-known solutions for all K tasks, along with the total computational cost (in terms of true function evaluations) incurred.

Classifier-Assisted Multi-Task Optimization (CA-MTO)

The computational efficiency of Evolutionary Multitask Optimization is not merely theoretical but is being quantitatively demonstrated through advanced algorithms like MGAD and CA-MTO, which dynamically manage knowledge transfer and leverage surrogate models to minimize expensive evaluations [10] [64]. The protocols and analyses presented provide a framework for researchers, particularly in fields like drug development, to rigorously assess the cost-benefit profile of EMTO in their specific large-scale scenarios. As the field progresses, the fusion of multitasking paradigms with sophisticated deep-learning frameworks and efficient resource management strategies will be crucial for tackling the next generation of computationally intensive problems, ultimately accelerating the pace of scientific discovery and innovation.

Conclusion

Evolutionary Multitasking represents a significant leap forward for optimizing neural networks in computationally intensive fields like drug discovery. By enabling simultaneous optimization and synergistic knowledge transfer across tasks, EMT frameworks demonstrably accelerate convergence, improve solution quality, and enhance the exploration of complex biological search spaces. The key takeaways underscore the importance of sophisticated knowledge transfer mechanisms to avoid negative transfer, the efficacy of dual-population and self-adjusting architectures for maintaining diversity, and the proven superiority of EMT in benchmarks and real-world applications such as feature selection and drug-associated prediction. Future directions should focus on scaling EMT to manage the optimization of dozens or even hundreds of concurrent tasks, deeper integration with large language models for heuristic design, and the development of more robust, automated task-similarity measures. For biomedical research, the widespread adoption of EMT promises to drastically reduce the time and cost associated with in-silico drug screening and multi-omics analysis, ultimately accelerating the pipeline from target identification to viable therapeutic candidates.