1 Introduction

The production of cars in Europe rely on a “built-to-order” system, where cars with specific options need to be assembled in relatively short periods of time. The cars to be assembled are typically diverse in respect to their required options, and hence, the assembly process is necessarily slow. Since 1993, the car manufacturer, Renault, has been tackling a variant of “car sequencing” with the aim of assembling a range of diverse cars as efficiently as possible (Solnon et al., 2008). In 2005, the subject of the ROADEF challenge was the same car sequencing problem (Nguyen & Cung, 2005), which is one of finding an optimal order, or sequence, in which cars are to be processed on a production line. Each car has a set of options such as paint colour, air conditioning, etc. Each option has an associated constraint that gives the maximum number of cars that can have this option applied in any contiguous subsequence. The car sequencing problem is known to be NP-hard (Kis, 2004).

There are many variants of car sequencing problems that all share common characteristics but with small differences in the objective function or constraint. A comprehensive study of all variants is beyond the scope of a single paper. Here we will focus on one popular optimization variant of the car sequencing problem which minimizes the violation of constraints on the occurrence of options within subsequences (Bautista et al., 2008; Thiruvady et al., 2011, 2014). The first study in this direction was conducted by Bautista et al. (2008), who showed that a beam search approach can be very effective. Following this study, Thiruvady et al. (2011) showed that a hybrid of beam search, ant colony optimization (ACO) and constraint programming can substantially improve solution qualities while reducing run-times. Thiruvady et al. (2014) proposed a hybrid of Lagrangian relaxation and ACO, and they showed the hybrid can find excellent solutions and at the same time very good lower bounds. A recent study in this direction is by Thiruvady et al. (2020), who showed that a large neighbourhood search (LNS) finds high quality solutions in relatively low run-times. Other methods used for this type of problem include beam search (Golle et al., 2015) and a hybrid neighbourhood search method combining tabu search and LNS (Zhang et al., 2018).

In the literature, there are three commonly used benchmarks (Gent, 1998; Gravel et al., 2005; Perron & Shaw, 2004a). The benchmark instances from Perron & Shaw (2004a) consisted of up to 500 cars with relatively high utilisation of options, and were considered as the “hardest” set of problem instances available to test algorithms against. Though, since then, studies (Thiruvady et al., 2011, 2014, 2020) have demonstrated different solvers (including those based on mixed integer programming (MIP), constraint programming and metaheuristics) have variable performance, and hence it is unclear as to which set of problem instances are in fact the most complex. Moreover, the specific characteristics of hard problem instances have not clearly been identified.

The aims of this study are twofold: (a) to address the gap of a lack of knowledge as to which characteristics of the car sequencing problem make finding high quality solutions hard, and (b) to use machine learning to build an algorithm selection model. The first aim is achieved by performing an instance space analysis for the problem. The methodology of instance space analysis was originally proposed by Smith-Miles et al. (2014), and has been successfully applied to many problems including combinatorial optimization (Smith-Miles et al., 2014), continuous optimization (Muñoz & Smith-Miles, 2017), machine learning classification (Muñoz et al., 2018), regression (Muñoz et al., 2021) and facial age estimation (Smith-Miles & Geng, 2020). However, it still requires problem-specific design to apply this general approach to the specific problem of car sequencing, such as feature extraction and instance generation.

To carry out an instance space analysis, we extract a vector of problem features to characterise a car sequence problem instance, and use principal component analysis to reduce the dimension of feature vectors into two. By doing so, an instance is projected to be a point in the 2-D instance space, allowing a clear visualization of the distribution of instances, their feature values and algorithms’ performance. Our analysis reveals two problem characteristics that make a car sequencing instance hard to solve, i.e., high utilisation of options and large average number of options per car class. Further, this analysis allows us to develop a problem instance generator, which can generate problem instances that can be considered extremely complex irrespective of the type of solver.

The second aim, building an algorithm selection model, is achieved by using machine learning (Rice, 1976; Smith-Miles, 2009; Muñoz et al., 2015) to identify which algorithm is most effective in a particular region of the instance space. We select three state-of-the-art algorithms from the literature as our candidates: the Lagrangian relaxation and ACO hybrid (Thiruvady et al., 2014) and two LNS algorithms proposed by Thiruvady et al. (2020). Additionally, given the vast improvement in commercial solvers, we investigate three algorithms not previously successful or attempted in the literature. These algorithms are MIP, MIP based on lazy constraints, and an adaptive LNS that builds on algorithms studied in Thiruvady et al. (2020).

In summary, the contributions of this study are:

  1. 1.

    Introducing two new optimization algorithms for the car sequencing problem: an adaptive LNS method and a MIP method with lazy constraints. We also demonstrate for the first time that with suitable tuning of parameters, modern MIP solvers are competitive for this problem.

  2. 2.

    Performing an instance space analysis for the car sequence problem and successfully identifying the characteristics that make an instance hard to solve.

  3. 3.

    Designing a benchmark to allow systematic generation of problem instances with controllable properties and showing that they complement the existing benchmark instances well in the instance space.

  4. 4.

    Developing an algorithm selection model based on machine learning, which shows that our adaptive LNS and the MIP solver based on Gurobi become the new state-of-the-art for solving the car sequencing problem. Further, the adaptive LNS performs the best for solving hard problem instances, while the MIP solver is the best for medium-hard instances.

This study is different to many others published in the literature in not simply identifying one new or improved algorithm but advancing the state-of-the-art while giving insight into what types of algorithms are best suited for different types of instances. The analysis of the instance space and new benchmark instances are expected to pave the way for future research in designing better algorithms for car sequencing problems.

The remainder of this paper is organized as follows. In Sect. 2, we introduce the methodology of algorithm selection and instance space analysis. In Sect. 3, we describe the car sequencing problem and present the MIP formulation used. Section 4 introduces the solution algorithms for the car sequencing problem, and Sect. 5 describes feature extraction and instance generation. Section 6 presents and analyses the experimental results, and the last section concludes the paper.

2 Algorithm selection and instance space analysis

The algorithm selection framework was originally proposed by (Rice, 1976). It has four key components:

  • Problem space (P) consists of instances from the problem of interest.

  • Algorithm space (A) contains algorithms available to solve the problem.

  • Feature space (F) contains measurable features extracted to characterise a problem instance.

  • Performance space (Y) consists of measures to evaluate the performance of algorithms when solving the instances, e.g., the optimality gap generated by an algorithm within a time limit.

For an instance \(x\in P\), a set of features f(x) can be first extracted. The goal of algorithm selection is to learn a mapping from the feature vector \(f(x) \in F\) to an algorithm \(\alpha ^* \in A\), such that the algorithm \(\alpha ^*\) performs the best on the instance x when compared to other algorithms in A. This is typically achieved by training a machine learning model on the data composed of the feature vectors of training instances and the performances of candidate algorithms on the training instances. Algorithm selection is closely related to meta-learning (Vanschoren, 2018) and auto machine learning (He et al., 2021), which aims to select the best parameter settings for machine learning models.

Algorithm selection is often beneficial for problem solving. For example the algorithm selection model, SATzilla-07, was able to solve in excess of 20% more instances in the 2007 SAT competition than any of its component solvers (Xu et al., 2008). Apart from the SAT problem, algorithm selection has been demonstrated successful on quantified Boolean formulas (Pulina & Tacchella, 2009), constraint solving (O’Mahony et al., 2008), answer set programming (Hoos et al., 2014), continuous black-box optimization (Kerschke & Trautmann, 2019) and clustering (Wang et al., 2020), to name a few. Some of those algorithm selection models were collected in the ASlib benchmark library (Bischl et al., 2016), and more comprehensive literature reviews are available in (Smith-Miles, 2009; Muñoz et al., 2015; Kerschke & Trautmann, 2019).

Fig. 1
figure 1

The methodological framework of algorithm selection and instance space analysis. Note that the reduced 2-D space is only used for visualization, and the algorithm selection models are built on the feature space F

Smith-Miles et al. (2014) further extended Rice’s algorithm selection framework by including a visualization component (See Fig. 1). A significant advance of the extended algorithm selection framework is that it provides great insights into the strengths and weaknesses of an algorithm as well as problem hardness. A 2-D instance space can be created and visualised by using dimension reduction techniques such as principal component analysis to reduce the dimension of feature vectors to two. Importantly, the performance of an algorithm can also be visualized in the 2-D space, clearly showing in which region of the instance space the algorithm performs well or badly. Finally, by visualizing the values of features in the same space, one can identify the features that are mostly correlated with the performance of algorithms, indicating which features make an instance hard to solve.

Despite the existence of many studies in this area, the existing algorithm selection models are all problem-specific, and thus cannot be directly applied to car sequencing problems. In the following, we will perform an instance space analysis and build algorithm selection models for the car sequencing problem.

3 The car sequencing problem

In this section, we describe the optimization version of the car sequencing problem, originally introduced by Bautista et al. (2008).

Consider D cars, each of which belongs to a class \(C=\lbrace c_{1}, \ldots , c_{K}\rbrace \), where K is the number of classes. The cars require one or more of a set of options \(O=\lbrace o_{1}, o_{2}, \ldots , o_{O}\rbrace \) that need to be installed. All cars within a class must be fitted with the same options. For this purpose, each car \(1,\ldots ,D\) has an indicator vector associated with it

$$\begin{aligned} \mathbf {r_{i}}[j]={\left\{ \begin{array}{ll} 1&{}\text {if car } i \text { requires option }j,\\ 0&{}\text {otherwise} \end{array}\right. }\qquad \text { for }1\le j \le O. \end{aligned}$$
(1)

Cars in the same class have the same indicator vector, or alternatively two cars i and k are in the same class if \(\mathbf {r_{i}}=\mathbf {r}_{k}\). Moreover, the number of cars in class j is denoted as \(d_{j}\).

Typically, each station that installs the options along the assembly line has a certain capacity, which if not met, will slow down the assembly line. Hence, to satisfy aspirational targets, we introduce for each option \(p_{j}\) and \(q_{j}\) (\(j=1,\ldots ,O\)), where in any subsequence of \(q_j\) cars, there should only be \(p_j\) cars that require option j.

Any sequence or ordering of the cars is a potential solution to the problem. We denote such a sequence by \(\pi \), and \(\pi _{k}\) is used to represent car class c in position k of \(\pi \). As any permutation between cars in the same class produces an equivalent solution, we eliminate such symmetries in the problem in order to reduce the search space.

A sequence that satisfies all subsequence constraints for all options can be considered as ideal. Though, such a sequence can be hard to find, and hence the optimization version of the problem focuses on modulating option utilisation across the sequence. The measures that capture the modulation of options are presented next.

3.1 Assignment measures

An option \(o_{i}\) with subsequence length \(q_{i}\) will have \(D-q_{i}+1\) subsequences. Those subsequences which have used \(o_{i}\) in excess of the allowable \(p_{i}\) options, can be computed as follows:

$$\begin{aligned} y_{ij}(\pi )=\max \biggl \lbrace 0, - p_{i} + \sum _{k=u_{i}(j)}^{j} r_{\pi (k)}[i] \biggr \rbrace \end{aligned}$$
(2)

where \(u_{i}(j)=\max (1, j+1-q_{i})\). Here \(u_{i}(j)\) is the starting position of a subsequence \(q_{i}\), which ends at position j. The inequality \(j<q_{i}\) implies that the subsequence starts at position 1, resulting in the subsequence size being less than \(q_{i}\).

We now define upper over-assignment (uoa) (Bautista et al., 2008), which computes the total number of times within a sequence \(\pi \) that an option \(o_{i}\) violated the limit \(p_{i}\):

$$\begin{aligned} uoa(\pi )=\sum _{i=1}^{O}\sum _{j=p_{i}}^{D}a_{ij}y_{ij}(\pi ) \end{aligned}$$
(3)

where \(a_{ij}\) is a penalty imposed for exceeding \(p_j\) in subsequence positions \(j+1-q_{i}\) to j (if \(j<q_{i}\), positions 1 to j). If \(p_j\) is never violated, then \(uoa(\pi )=0\), and \(\pi \) is considered as a feasible solution.

The upper under-assignment (uua) is a second measure that determines the number of times, and by how much, the utilisation of options within the subsequences are under \(p_j\):

$$\begin{aligned} uua(\pi )&=\sum _{i=1}^{O}\sum _{j=p_{i}}^{D}b_{ij}z_{ij}(\pi ) \end{aligned}$$
(4)
$$\begin{aligned} z_{ij}(\pi )&=\max \biggl \lbrace 0, p_{i} - \sum _{k=u_{i}(j)}^{j} r_{\pi (k)}[i] \biggr \rbrace , \end{aligned}$$
(5)

where \(b_{ij}\) is a penalty imposed for having fewer than \(p_{i}\) cars in subsequence positions \(j+1-q_{i}\) to j (if \(j<q_{i}\), positions 1 to j).

Penalties for over assignment are typically different to that of under assignment, and the penalties in both cases can vary across a sequence. If the penalties were set as constants, it would suffice to measure only over assignments (and hence over assignment penalties), as the total increase in over assignments leads to an equivalent decrease in under assignments in other parts of the sequence.

3.2 Mixed integer programming model

Bautista et al. (2008) provided an efficient MIP formulation for this problem. We use this formulation for the MIP-based methods that follow in this study.

As introduced earlier, y and z are chosen as variables (Sect. 3.1). Moreover, we define binary variables x where \(x_{it} = 1\) if car class i is chosen in position t. The objective is to minimize \(uoa(\pi )+uua(\pi )\) corresponding to Eqs. (3) and (4). The problem can be defined as follows:

$$\begin{aligned}&\text {Min.} \quad \sum _{i=1}^{O}\sum _{j=p_{i}}^{D}\qquad \left( a_{ij}y_{ij}+b_{ij}z_{ij}\right) \\&{\text {Subject to}}\nonumber \end{aligned}$$
(6)
$$\begin{aligned}&\sum _{i=1}^{K}x_{it}=1 \qquad t \in [1,\ldots ,D] \end{aligned}$$
(7)
$$\begin{aligned}&\sum _{t=1}^{D}x_{it}=d_{i}\qquad i \in [1,\ldots ,K] \end{aligned}$$
(8)
$$\begin{aligned}&\sum _{i=1}^{K}\sum _{u_j(t) }^{t}\mathbf {r}_i[j] \cdot x_{ik}=p_{j}+y_{jt}-z_{jt}\qquad \forall j\in [1,\ldots ,O], \quad t \in [p_{j},\ldots ,D] \end{aligned}$$
(9)
$$\begin{aligned}&z_{jt}, y_{jt}\ge 0 \qquad \forall j \in [1,\ldots ,O], \quad t \in [p_{j},\ldots ,D] \end{aligned}$$
(10)
$$\begin{aligned}&x_{it}\in [0,1]\qquad \forall i \in [1,\ldots ,K], \quad t\in [1,\ldots ,D]. \end{aligned}$$
(11)

This formulation differs slightly from the discussion of the problem in the previous section. Observing that two cars of the same class are interchangeable (i.e., swapping two cars of the same class leads to the same solution), this formulation focuses on selecting car classes in each position. It has the added advantage of reducing the solution space by removing symmetries.

4 Algorithms

A large number of solution methods have been proposed to tackle at least some variants of the car sequencing problem, including exact methods such as constraint programming (Dincbas et al., 1988), MIP (Gravel et al., 2005), meta-heuristics (Gottlieb et al., 2003; Sun & Fan, 2018) and hybrid methods (Khichane et al., 2008). A comprehensive review of this literature is beyond the scope of this paper. Here, we focus on techniques for solving the optimization variant of the car sequencing problem, including Lagrangian relaxation and ant colony optimization (LR-ACO) (Thiruvady et al., 2014), and large neighbourhood search (LNS) (Thiruvady et al., 2020). Further, we introduce a new adaptive LNS algorithm and investigate a MIP method with lazy constraints.

4.1 Lagrangian relaxation and ant colony optimization

An effective Lagrangian relaxation procedure from the MIP model (Sect. 3.2) was proposed by Thiruvady et al. (2014). Among the different sets of constraints that can be relaxed, they showed that Constraint (7) leads to the most effective Lagrangian relaxation. Hence, the resulting problem consists of all other constraints and the objective is modified as follows:

$$\begin{aligned} LLR(\lambda ) = \min \sum _{j=1}^O \sum _{t=1}^D (a_{jt} y_{jt} + b_{jt} z_{jt}) + \sum _{t=1}^D \lambda _t \left( \sum _{i=1}^K x_{it}\,-1\right) . \end{aligned}$$
(12)

Solving this relaxation leads to a solution that is potentially infeasible. Specifically, certain positions (i in \(x_{it}\)) may have no car class assigned and other positions may have two or more cars assigned. To rectify this infeasibility, Lagrangian multipliers (\(\lambda \)) are introduced, which penalise positions in the sequence where either zero or more than one car have been assigned. The multipliers are updated via subgradient optimization (Bertsekas, 1995), which aims to find the “ideal” set of multipliers over a number of iterations. The ACO algorithm (Dorigo & Stűtzle, 2004) is incorporated within Lagrangian relaxation as a repair method, leading to the LR-ACO heuristic. The flowchart of the LR-ACO method is shown in Fig. 2 and its pseudocode and associated details are presented in “Appendix A”.

Fig. 2
figure 2

A high-level view of the LR-ACO heuristic, where Lagrangian relaxation is first applied, and ACO is then used as a repair method

4.2 Large neighbourhood search

The LNS of Thiruvady et al. (2020) is used as the basis for the LNS implementations in this study. At a high-level the procedure works by starting with a (good) feasible solution, considering subsequences within the sequence, solving a MIP of the subsequences, and repeating this process until some termination criterion is met. The motivation for this approach is that solving the original MIP can be very time consuming, and hence solving smaller part of the original problem and piecing them together can be achieved efficiently.

Fig. 3
figure 3

An illustration of one iteration of the LNS method. In this example, there are ten cars and four car classes, indicated by the colours: blue, green, yellow and orange. The first four cars are considered in the MIP, and solved leading to swapping all four car classes. Following this step, there is a shift of two cars, and the next four cars are considered to be solved. The search in this iteration continues until reaching the end of the sequence

The procedure works as follows. First, a high quality solution is obtained in a short time-frame. Following this step, subsequences within the solution are considered sequentially. For each subsequence, a MIP (Sect. 3.2) is generated (i.e., subproblem) that fixes all variables in the original solution, but leaves those solution components associated with the subsequence free. The MIP is then solved, possibly to a time limit or gap for efficiency purposes.Footnote 1 If the subsequence sizes are small, all subproblems can be solved efficiently. In some cases large subproblems can also be efficiently solved. In one iteration of the algorithm, the search progresses by solving a subsequence, defining a new subsequence by moving forward a few positions, and continuing this process until the end of the subsequence is reached (see Fig. 3 for an example). Key to the success of the method is to move forward a few positions by ensuring at least some overlap with the previous subsequence. The above process is repeated for multiple iterations until the termination criterion is reached. The pseudocode of LNS and its associated details are presented in “Appendix B”.

The size of the subsequence (i.e., window size) has a large impact on the performance of the LNS algorithm. In our experiments, we will test two versions of LNS with different window sizes w: (1) 10-LNS with \(w=10\), and (2) LCM-LNS with w equal to the lowest common multiple of q values.

4.3 A new adaptive LNS approach

We make three modifications to the LNS algorithm proposed by Thiruvady et al. (2020) to improve its performance, creating a new adaptive LNS.

Firstly, we note that it is a waste of computational resources when solving consecutive subsequences of cars repeatedly. Considering a car sequence \(\pi \) to be improved, if a number of initial subsequences of \(\pi \) are already solved to optimality, there is no need to re-optimize those subsequences in the following iterations. To address this issue, we propose to optimize non-consecutive subsequences of \(\pi \) so that different subsequences are optimized in each iteration of the algorithm. More specifically, we generate a random permutation (P) of integers from 1 to D in each iteration, and solve non-consecutive subsequences of \(\pi \) indexed from \(P[\hat{s}]\) to \(P[\hat{s}+w]\), where w is the window size, and \(\hat{s}\) is the starting point, which increments by an interval of w/2 to generate multiple subsequences. Note that the randomization of P has the effect that any large neighbourhood consisting of w cars may be searched, rather than only subsequences of w consecutive cars as was the case in the original LNS.

Secondly, the original LNS algorithm uses a fixed window size w for an instance, i.e., setting w to a constant (e.g., ten) or the lowest common multiple of q values. However, we observe that for some instances, the algorithm can solve subsequences of small sizes very quickly and may get stuck in a local optimum in an early stage of the search. To address this potential issue, we introduce an adaptive setting for the window size, that is, starting with a window size of thirty and increasing the window size by one if the improvement of the optimality gap in an iteration is less than 0.005. By doing so, the algorithm can escape from a local optimum by using a larger window size. If computational time allowance is sufficient and the window size is eventually increased to D, solving the subsequence is equivalent to solving the whole sequence (i.e., the original problem).

In addition, given the vast improvement in commercial solvers, we use Gurobi to solve the MIP within a cutoff time to generate an initial feasible solution for the LNS algorithm, instead of using LR-ACO. This modification makes the algorithm simpler and more straightforward to implement.

4.4 MIP with lazy constraints

In similar fashion to Lagrangian relaxation, the lazy constraint method relaxes one or more of the sets of constraints. As we have previously motivated, solving the relaxed problem is typically time efficient, and the aim in the lazy method is to exploit this potential advantage. The key idea behind this procedure is that solutions typically violate a small number of constraints within the constraint set. If chosen carefully, it is possible to find high quality feasible solutions with little overhead.

The procedure works as follows. Prior to the execution of the algorithm, a constraint in the MIP formulation is chosen to be relaxed. The algorithm initializes the MIP model and executes without considering the chosen constraint. When a feasible solution is found, it is tested for feasibility against the constraint set that is relaxed. If violated, the specific constraint is introduced to the model as a “lazy” constraint (note, not the whole constraint set), and the procedure executes again. This process repeats until a provably optimal solution (to the original problem) is found or until a time limit expires.

For the purposes of this study, and in preliminary experimentation, we found that Constraints (9) are the most effective as implemented within the lazy method. This finding is not surprising, since relaxing this constraint set leads to a MIP that is equivalent to an assignment or bi-partite matching problem (Manlove, 2013). These problems are known to be polynomial, and can hence be solved efficiently. Note, there is a potential disadvantage to this approach, which is that the link between the x and y/z variables are broken. It means for specific problem instances, an initial feasible solution can be hard to find (i.e. too many lazy constraints are needed to find a feasible solution).

5 Feature extraction and instance generation

5.1 Features

We seek to determine the features of the car sequencing problem that will allow generating hard problem instances, i.e., ones that are considered hard for existing and newly proposed methods. There are certain features that are shown to lead to problem hardness as identified in previous studies (e.g. number of cars), and we also consider other features which could lead to creating hard instances (Gent & Walsh, 1999; Gravel et al., 2004; Perron & Shaw, 2004b). The following are the features chosen using past studies as a guide:

  • \(f_1\) number of cars (num-cars),

  • \(f_2\) number of options (num-options),

  • \(f_3\) number of car classes (num-classes),

  • \(f_4\) minimum of option utilisation (min-utilisation),

  • \(f_5\) average of option utilisation (ave-utilisation),

  • \(f_6\) maximum of option utilisation (max-utilisation),

  • \(f_7\) standard deviation of option utilisation (std-utilisation),

  • \(f_8\) average number of options per car class (ave-opt-class),

  • \(f_9\) minimum of p/q ratio (min-pq-ratio),

  • \(f_{10}\) average of p/q ratio (ave-pq-ratio),

  • \(f_{11}\) maximum of p/q ratio (max-pq-ratio),

  • \(f_{12}\) standard deviation of p/q ratio (std-pq-ratio),

  • \(f_{13}\) lowest common multiple of q values (lcm-q-values),

  • \(f_{14}\) standard deviation of car class populations (std-class-pop).

Note, the utilisation of an option o is defined as: \(\mu _o(\sum _{i = 1}^{D}r_i)/D\), where \(r_i = 1\) if and only if car i requires option o and \(\sum _{i = 1}^{D}r_i\) is the total number of cars that require option o, and \(\mu _o(t)\) is the minimum length sequence needed to accommodate option o occurring t times, i.e., \(\mu _o(t) = p_o((t-1)\) div \(q_o) + ((t-1)\) mod \(q_o) + 1\) (Perron & Shaw, 2004a).

The features \(f_1\)\(f_{12}\) have already been used in different ways to create current benchmark problem instances (Gent & Walsh, 1999; Gravel et al., 2004; Perron & Shaw, 2004b). Hence, we use them as part of our feature set. Moreover, we have seen in Thiruvady et al. (2020) that the LCM of q values typically leads to effective subsequences that can be solved efficiently, and hence we use this as Feature 13. Finally, all known problem instances have been obtained by using uniform distributions to assign cars to classes, whereas allowing variances in these assignments could impact the hardness of a problem. Hence, we also include Feature 14 in our feature set.

These features will be used to perform instance space analysis and also used as inputs into machine learning models to build algorithm selection models. Hence, they form the basis of this study. Moreover, we will systematically generate new problem instances by varying these features in the following.

5.2 Instances

We use a total of 247 problem instances in this study, including 121 existing benchmark instances and 126 newly generated instances.

The existing benchmark instances we have chosen consist of the nine instances available from CSPLIB (Gent & Walsh, 1999), 82 from (Perron & Shaw, 2004a), and 30 from (Gravel et al., 2005). These benchmark instances were previously considered “hard” to solve, though, the last decade and a half has seen rapid development of commercial solvers and extensive research in heuristic/meta-heuristic methods, a number of these problem instances can now be solved efficiently. Hence, we also generate new problem instances with various problem characteristics for stress testing of algorithm performance.

The new problem instances are generated as follows. To generate an instance specified with n cars, o options, and k car classes, we generate o-dimensional option vectors, by including option i with probability \(p_i/q_i\), until we reach k unique vectors. We then assign each car to a car class with uniform probability. Each of the following fourteen sets contain nine instances, three for each size of 100, 300, and 500 cars, for an overall total of 126 new instances:

  1. 1.

    noBhiU: no bias and high utilisation. This is the “default” set with \(o = 5\), \(k = 25\), \(1 \leqslant p \leqslant 3\), and \(1 \leqslant q - p \leqslant 2\). The car classes are selected uniformly on a per-car basis, and utilisation rate for all options is between 90 and 100%.

  2. 2.

    negBhiU: negative bias and high utilisation. It is the same as noBhiU, except that car classes with fewer options receive more cars. In particular, when a car is assigned to a given class k containing \(o_k\) options, we randomly reassign it with probability \(o_k/o\).

  3. 3.

    posBhiU: positive bias and high utilisation. It is the same as noBhiU, except that car classes with more options receive more cars. In particular, when a car is a assigned to a given class k containing \(o_k\) options, we randomly reassign it with probability \(1 - o_k/o\).

  4. 4.

    hiPQhiU: high p/q ratios and high utilisation. It is the same as noBhiU, except that \(2 \leqslant p \leqslant 4\).

  5. 5.

    hiPQmedU: high p/q ratios and medium utilisation. It is the same as hiPQhiU, except that all option utilisation rates are between 70 and 80%.

  6. 6.

    negBhiPQloU: negative bias, high p/q ratios, and low utilisation. It is the same as hiPQhiU, except that all option utilisation rates are between 50 and 60%, and negative bias is added. It is difficult to generate such low-utilisation instances using p/q option probabilities without negative bias.

  7. 7.

    loPQ: low p/q ratios. The settings for loPQ are: \(o = 8\), \(k = 20\), \(1 \leqslant p \leqslant 2\), \( 2 \leqslant q - p \leqslant 3\), and no lower bound on utilisation. It is difficult to generate low-p/q-ratio instances with \(o = 5\) and \(k = 25\).

  8. 8.

    negBfixedPQ: negative bias and fixed p and q. The parameter p is fixed to (3, 2, 1, 2, 1); q is fixed to (4, 3, 4, 5, 2); and there is no lower bound on utilisation. It is difficult to generate these instances (presumably again because of the low p/q ratio included, namely 1/4) without negative bias.

  9. 9.

    randN: random configuration of car classes. It is the same as noBhiU, except that we select uniformly from the \(\left( {\begin{array}{c}n+k-1\\ k-1\end{array}}\right) \) possible assignments of cars to car classes, instead of uniform distribution of cars to classes. Doing so increases the variance in car class populations.

  10. 10.

    RloU: random car classes and low utilisation (with option utilisation rate only restricted to \(>50\%\)).

  11. 11.

    RanyU: random car classes, no utilisation restrictions, and 10 options.

  12. 12.

    RnegBvloU: random car classes, negative bias, and very low utilisation (with option utilisation rate \(<60\%\)).

  13. 13.

    RnegBhiPQvloU: random car classes, negative bias, high p/q ratios, and very low utilisation (with option utilisation rate \(<60\%\)).

  14. 14.

    RnegBloPQvloU: random car classes, negative bias, low p/q ratios, and very low utilisation (with option utilisation rate \(<60\%\)).

While the above sets of parameter choices may seem somewhat arbitrary, they were selected after significant experimentation to provide a good coverage of the instance space, as will be shown in Sect. 6.1.

6 Experimental results

This section presents the experimental results of the instance space analysis and algorithm selection for the car sequencing problem. The solution algorithms were implemented in C++ and compiled with GCC-5.2.0, and the algorithm selection models were implemented in Python with the scikit-learn library (Pedregosa et al., 2011). The experiments were conducted on the Monash University campus cluster, MonARCH. The source code and problem instances are publicly available at https://github.com/yuansuny/CSP.git.

6.1 Visualizing instances

We create a 2-D instance space to visualize how well the newly generated problem instances complement the existing instances available in the literature. Each instance is represented by a vector of fourteen features listed in Sect. 5.1. For features that are not in the range of 0 and 1 (i.e., number of cars, number of options, number of car classes, and lowest common multiple of q values), we normalize them by their maximum value. For better visualization, we select eight features that are mostly correlated with the algorithm performances using information theoretic feature selection methods (Sun et al., 2020). The eight features selected are \(f_2\) (num-options), \(f_3\) (num-classes), \(f_4\) (min-utilisation), \(f_5\) (ave-utilisation), \(f_7\) (std-utilisation), \(f_8\) (ave-options), \(f_{10}\) (ave-pq-ratio), and \(f_{11}\) (max-pq-ratio).

We then use dimensionality reduction techniques to project the eight-dimensional feature vectors onto a 2-D plane. The techniques used are principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE) (Van der Maaten & Hinton, 2008) and uniform manifold approximation and projection (UMAP) (McInnes et al., 2018). PCA is a linear dimensionality reduction method, whilst t-SNE and UMAP are nonlinear ones. The resulting 2-D instance spaces are shown in Fig. 4. The percentage of variance explained versus the number of components selected in PCA is shown in Fig. 5. The first two principal components explain 88% of the data variance, and the corresponding transformation is listed as follows:

$$\begin{aligned} Z = \begin{bmatrix} 0.46417697 &{} 0.13118315 \\ -0.13786325 &{} 0.40042554 \\ -0.71876354 &{} 0.03250308 \\ -0.36351993 &{} -0.44927092 \\ 0.22506848 &{} -0.33242684 \\ -0.25113443 &{} 0.21863308 \\ -0.04415605 &{} 0.59942745 \\ 0.03303967 &{} 0.31926206 \end{bmatrix}^{T} \begin{bmatrix} \text {num-options} \\ \text {num-classes} \\ \text {min-utilisation} \\ \text {ave-utilisation} \\ \text {std-utilisation} \\ \text {ave-options} \\ \text {ave-pq-ratio} \\ \text {max-pq-ratio} \end{bmatrix}. \end{aligned}$$
(13)
Fig. 4
figure 4

The 2-D instance spaces created by PCA, t-SNE and UMAP. Here each subfigure, a dot represents a problem instance, and its colour indicates which category the instance belongs to

Fig. 5
figure 5

The percentage of variance explained versus the number of components selected in PCA. The first two components capture 88% of the data variance

From Fig. 4, we can observe that the newly generated instances complement the existing instances (i.e., CSPLIB, Gravel, and Perron & Shaw), and they together fill the instance spaces well. We also visualize where each category of the newly generated instances lies in the 2-D instance spaces in Fig. 6. In the 2-D space created by PCA, the instances are loosely spread out. In contrast, the instances form well separated clusters in the 2-D spaces created via t-SNE and UMAP, which is generally preferable in visualization. The location of the instances in the t-SNE space is similar to that in the PCA space. For example, the existing problem instances, i.e., CSPLIB, Gravel, and Perron & Shaw, mainly lie at the bottom of the PCA and t-SNE spaces. However, the instance space created by UMAP is very different from the spaces created by both PCA and t-SNE.

Fig. 6
figure 6

The distribution of the newly generated instances in the 2-D spaces created by PCA, t-SNE and UMAP

6.2 Visualizing algorithm performance

We test the performance of six algorithms on the 247 problem instances. The algorithms are LR-ACO, 10-LNS (LNS with a fixed window size of 10), LCM-LNS (LNS with a window size computed by the lowest common multiple of q values), and MIP (solved by Gurobi), as well as two new algorithms i.e., Adp-LNS (the adaptive version of LNS) and MIP with lazy constraints (solved by Gurobi). The parameter setting for LR-ACO is consistent with the original paper (Thiruvady et al., 2014). Gurobi 6.5.1 is used to solve the MIPs with 4 cores and 10GB of memory given. The parameter settings for Gurobi are tuned by hand: barrier iteration limit = 30; MIPgap = 0.005; Presolve = 1 (conservative); barrier ordering = nested dissection; cuts = -1 (aggressive); cut passes = 200 (large); barrier convergence tolerance = 0.0001; presparsify = 1 (turn on always); pre pass limit = 2. The time limit for each algorithm to solve an instance is set to one hour. The objective value (\(y_\mathrm {b}\)) generated by an algorithm for an instance is compared against the best known lower bound (\(y_\mathrm {lb}\)) for that instance to compute an optimality gap \((y_\mathrm {b}-y_\mathrm {lb})/y_\mathrm {b} \in [0,1)\). The optimality gap generated by an algorithm for an instance indicates how well the algorithm performs on that instance.

Fig. 7
figure 7

The optimality gap generated by the MIP, Lazy, and LR-ACO algorithms for the problem instances tested. Each dot represents a problem instance in the 2-D space created by PCA, t-SNE or UMAP, and the colour of the dots indicates the optimality gap (red is larger)

Fig. 8
figure 8

The optimality gap generated by the Adp-LNS, LCM-LNS, and 10-LNS algorithms for the problem instances tested

The performance of each algorithm is visualized in the 2-D instance spaces in Figs. 7 and 8, where each dot represents an instance and its colour indicates the optimality gap generated by the corresponding algorithm for that instance. We observe that most of the hard instances lie in the bottom left region of the instance spaces created by PCA and t-SNE, which corresponds to five categories of the newly generated instances (i.e., noBhiU, negBhiU, posBhiU, hiPQhiU, randN), as well as the CSPLIB and Gravel instances. Some of the newly generated RanyU and loPQ instances are also hard. The Perron & Shaw instances were considered to be among the hardest problem instances available in the literature (Perron & Shaw, 2004a), but we observe that these instances are relatively easy for the six algorithms to solve. Compared to PCA, the t-SNE method generates well separated clusters, allowing a clearer visualisation of which categories of instances are hard. Although the UMAP space is very different from the other two, we can make similar observation from the UMAP space. The LR-ACO algorithm is clearly outperformed by other algorithms, thus we will leave LR-ACO out in the subsequent analysis.

Fig. 9
figure 9

The distribution of the features (min-utilisation, ave-utilisation, ave-options, and num-options) in the 2-D instance space created via PCA, t-SNE or UMAP. Each dot represents a problem instance, and its colour indicates the value of the corresponding feature of that instance

Fig. 10
figure 10

The distribution of the features (std-utilisation, num-classes, ave-pq-ratio, and max-pq-ratio) in the 2-D instance space created via PCA, t-SNE or UMAP. Each dot represents a problem instance, and its colour indicates the value of the corresponding feature of that instance

6.3 Visualizing features

We visualize the eight features in the 2-D instance spaces created via PCA, t-SNE and UMAP in Figs. 9 and 10, where each dot represents an instance and its colour represents the value of corresponding feature. The aim is to identify which features make an instance hard to solve. We can observe that the minimum of option utilisation (min-utilisation) and average of option utilisation (ave-utilisation) are highly positively correlated with the optimality gap generated by the algorithms (see Figs. 7, 8 and 9), meaning that high option utilisation makes an instance hard to solve. This result is not surprising since higher option utilisation effectively means that satisfying the subsequence constraints is a lot more complex, and as a consequence, the modulation of the use of option is also more complex. Instances with a large average number of options per car class (ave-options) are mostly hard to solve. For the same reasons previously discussed, this result is also expected. However, the number of options (num-options) is negatively correlated with the optimality gap, indicating that a large number of options makes an instance easy to solve. This is an interesting outcome which shows that a large number of options allows less room for car classes to be sequenced in different positions. There is no significant correlation between the algorithm performance and other four features, i.e., standard deviation of option utilisation (std-utilisation), number of car classes (num-classes), average p/q ratio (ave-pq-ratio), and maximum p/q ratio (max-pq-ratio), as shown in Figs. 7, 8, and 10.

6.4 Algorithm selection

We build our algorithm selection model based on machine learning using the full set of fourteen features (Sect. 5.1) to select the best algorithm for solving a given instance. We perform four comparisons between the algorithms: (1) MIP vs Lazy, (2) Adp-LNS vs LCM-LNS, (3) Adp-LNS vs 10-LNS, and (4) MIP vs Adp-LNS. For each pair of algorithms (denoted as A and B), we divide the instances into three classes: (1) algorithm A performs better; (2) algorithm B performs better; or (3) algorithms A and B perform equally well if the difference between the optimality gaps generated are less than 0.005. It then becomes the standard three-class classification problem, and any classification algorithm can be used for this task. We have tested support vector machine (SVM) with a radial basis function kernel, k nearest neighbour (KNN), decision tree (DT) and logistic regression (LR) models. We have tuned the regularization parameter (C) of SVM and LR, number of neighbours (k) of KNN, and maximum tree depth (d) of DT, with the ten-fold cross-validation accuracy reported in Table 1. The best parameter values identified are \(C=1000\) for SVM, \(k=3\) for KNN, \(d=2\) for DT and \(C=10\) for LR. The other parameter settings are consistent with the default of the scikit-learn library.

Table 1 The ten-fold cross-validation accuracy of algorithm selection generated by the classifiers with different parameter settings
Table 2 The average ten-fold cross-validation accuracy of algorithm selection
Table 3 The AUC, precision, recall and F1-score generated by each machine learning model for the algorithm selection tasks

The average ten-fold cross-validation accuracy of algorithm selection over 100 runs using each machine learning model is reported in Table 2. The results show that the machine learning models achieve a reasonably good accuracy for the algorithm selection tasks, comparing to the ZeroR method that simply predicts the majority class. KNN is clearly outperformed by the other three machine learning models.

We report the AUC (area under the receiver operating characteristic curve), precision, recall and F1-score generated by each machine learning model for the algorithm selection tasks in Table 3. The machine learning models achieve a reasonably good performance in classifying majority class, while they may have trouble in predicting minority class when the data for the classification task is extremely unbalanced. For example, the machine learning models do not perform well in classifying the instances where 10-LNS outperforms Adp-LNS. However, this result is acceptable in practice because Adp-LNS generally performs much better than 10-LNS, and simply selecting Adp-LNS to solve all the instances would generate the best results for 94% of the instances when compared to 10-LNS.

Fig. 11
figure 11

Pairwise comparison between the algorithms in terms of the optimality gap generated. Each dot represents a problem instance in the 2-D space created by PCA, t-SNE or UMAP, and the colour of the dots indicates which algorithm performs better. The colour of a region indicates which algorithm performs better in that region predicted by SVM

Fig. 12
figure 12

The decision tree for selecting the best algorithm between the MIP method and Adp-LNS to solve a problem instance

Table 4 The logworth values, \(-\log _{10}(p)\), for each feature coefficient using the p-values (p) from the LR models. The logworth values greater than 1.30 (\(p<0.05\)) are bolded, indicating the corresponding features are significant
Fig. 13
figure 13

Algorithm selection among MIP, Lazy, Adp-LNS, LCM-LNS, and 10-LNS. Each dot represents a problem instance in the 2-D space created by PCA, t-SNE or UMAP, and the colour of the dots indicates which algorithm performs better in terms of the optimality gap. The colour of a region indicates which algorithm performs better in that region predicted by SVM

To gain a deeper insight into the algorithm performance, we visualize the decision boundary identified by SVM in the 2-D instance spaces created by PCA, t-SNE and UMAP in Fig. 11. The colour of a dot represents which algorithm performs better on that instance and the colour of a region indicates which algorithm performs better for that region based on the SVM prediction. We can clearly see that MIP performs equally well as Lazy on seven categories of newly generated easy problem instances (i.e., RloU, RnegBvloU, RnegBhiPQvloU, RnegBloPQvloU, hiPQmedU, negBhiPQloU, and negBfixedPQ), and outperforms Lazy on other instances. The Adp-LNS algorithm outperforms the other two LNS algorithms on most of the problem instances except RanyU instances. We then take the best performing exact algorithm, MIP, and compare it against the best performing local search method, Adp-LNS. Interestingly, by comparing Fig. 11 with Figs. 7 and 8, we can see a clear pattern that (1) Adp-LNS outperforms MIP on hard instances; (2) MIP outperforms Adp-LNS on medium-hard instances; and (3) they do equally well on easy instances. These observations are consistent when using different instance spaces created by PCA, t-SNE and UMAP.

Fig. 12 presents the decision tree learned by the DT classifier to select the best algorithm between MIP and Adp-LNS for solving a problem instance. Three features, max-utilisation, ave-pq-ratio, and min-utilisation, have been identified as important features by DT. These results provide great insights into the algorithms’ performance, and a simple guideline on which algorithm should be selected to solve a given car sequencing problem instance in practice.

To further investigate the significance of features for algorithm selection, we report in Table 4 the logworth value, \(-\log _{10}(p)\), for each feature coefficient, where p is the p-value of a feature coefficient from the LR models. Overall, the num-options, ave-options, ave-pq-ratio, and the option utilisation features (e.g., min-utilisation and ave-utilisation) are the most significant. These results are largely in agreement with those in Sect. 6.3, where we showed that min-utilisation, ave-utilisation, ave-options and num-options are highly correlated with the algorithm performance. The option utilisation features (i.e., min-utilisation, ave-utilisation, max-utilisation, and std-utilisation) are very useful for selecting the best algorithm between MIP and Adp-LNS, which is consistent with the results in Fig. 12. For the selection between MIP and Lazy, the num-cars, num-options, ave-options, and ave-pq-ratio features are significant. Interestingly, but not surprisingly, the lcm-q-values feature is significant for selection between Adp-LNS and LCM-LNS.

Finally, we build a machine learning model to select the best algorithm among MIP, Lazy, Adp-LNS, LCM-LNS, and 10-LNS for solving a problem instance. We divide the training problem instances into five classes, each with an algorithm that performs the best in that class (i.e., achieving the lowest optimality gap). An SVM model is then trained for this five-class classification problem, and the decision boundaries identified by SVM in the 2-D instance spaces created by PCA, t-SNE and UMAP are shown in Fig. 13. The Adp-LNS algorithm is predicted to perform best in the region with hard problem instances (e.g., noBhiU, negBhiU, posBhiU, hiPQhiU, randN, CSPLIB, and Gravel instances), and MIP is predicted to perform the best in most of the remaining space.

7 Conclusion

Car sequencing problems are NP-hard, and as such the quest for better solution methods is always an ongoing one. In this paper, we have performed an instance space analysis for the car sequencing problem, which enabled us to gain deeper insights into problem hardness and algorithm performance. This analysis was achieved by extracting a vector of features to represent an instance and projecting the feature vectors onto a 2-D space via principal component analysis. The instance space provided a clear visualization for the distribution of instances, their feature values and the performance of algorithms. We systematically generated instances with controllable properties and showed that the newly generated instances complement the existing benchmark instances well in the instance space. Further, our analysis showed that the instances with a high utilisation or a large average number of options per car class are mostly hard to solve. Finally, we built algorithm selection models using machine learning to select the best algorithm for solving an instance, and found that our new adaptive large neighbourhood search algorithm performed the best on hard problem instances, while the MIP solver, Gurobi, performed better on medium-hard instances. Our result based on decision tree provides a simple and effective guideline on which algorithm should be selected to solve a given instance. The new benchmark instances created here, and the insight as to which parts of the instance space are most amenable to integer programming versus local search approaches is expected to provide a useful basis for any future research in designing better algorithms for car sequencing problems.