Solving a mixed-integer programming formulation of a classification model with misclassification limits
Brooks, J. Paul
MetadataShow full item record
Classification, the development of rules for the allocation of observations to one or more groups, is a fundamental problem in machine learning and has been applied to many problems in medicine and business. We consider aspects of a classification model developed by Gallagher, Lee, and Patterson that is based on a result by Anderson. The model seeks to maximize the probability of correct G-group classification, subject to limits on misclassification probabilities. The mixed-integer programming formulation of the model is an empirical method for estimating the parameters of an optimal classification rule, which are identified as coefficients of linear functions by Anderson. The model is shown to be a consistent method for estimating the parameters of the optimal solution to the problem of maximizing the probability of correct classification subject to limits on inter-group misclassification probabilities. A polynomial time algorithm is described for two-group instances. The method is NP-complete for a general number of groups, and an approximation is formulated as a mixed-integer program (MIP). The MIP is difficult to solve due to the formulation of constraints wherein certain variables are equal to the maximum of a set of linear functions. These constraints are conducive to an ill-conditioned coefficient matrix. Methods for generating edges of the conflict graph and conflict hypergraphs are discussed. The conflict graph is employed for finding cuts in a branch-and-bound framework. This technique and others lead to improvement in solution time over industry-standard software on instances generated by real-world data. The classification accuracy of the model in relation to standard classification methods on real-world and simulated data is also noted.