Statistics, Computation & Applications
MetadataShow full item record
When statistics meets real applications, the computational aspect of the statistical methods becomes critical. In this dissertation, I try to improve the computational efficiency of some statistical methods, so that they become both computationally and statistically optimal. Inspired by the recent development of the distance-based methods in statistics, I first propose a novel distance-based canonical analysis method. Secondly, an efficient algorithm of calculating distance-based statistics is studied. Moreover, a new semidefinite programming algorithm is also developed for the applications in power flow analysis problems; it appears to be more robust than existing methods. I give more details in the following. In the first part of this dissertation, we introduce a novel dimension reduction method called distance-based independence screening for canonical analysis (DISCA), which can be used to reduce dimensions of two random vectors with arbitrary dimensions. The essence of our method -- DISCA -- is to use the distance-based independence measure -- distance correlation, which was proposed by Székely and Rizzo in 2007 -- to eliminate the “redundant” dimensions until infeasible. Numerically, DISCA is to solve a non-convex optimization problem. Algorithms and theoretical justifications are provided, and the comparisons with other existing methods demonstrate its accuracy, universality, and effectiveness. An R package DISCA can be found on GitHub. Noticing that distance correlation used in DISCA is computationally expensive with the increase of space dimensions, in the second part of this dissertation, we manage to accelerate the calculation of distance-based statistics, by projecting multidimensional variables onto pre-specified projection directions, with the improvement of computational complexity from O(m∙m) to O(nm∙log(m)), where n is the number of projection directions and m is the sample size. Computational savings are achieved when n≪m/log(m). The optimal pre-specified projection directions can be obtained by minimizing the worse-case difference between the true distance and the approximated distance. We provide solutions and greedy algorithms for different scenarios, and confirm the advantage of our technique in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated. In the third part of this dissertation, we turn our focus on the applications of statistical computational algorithms in power systems area. A new semidefinite programming algorithm is proposed to solve the power flow and power system state estimation problems. Both two kinds of problems are non-convex, and convex relaxation is the typical approach to non-convexity in power systems area, while the objective functions are required to be carefully designed in order to keep the equivalency before and after relaxation. We first reformulate the two types of complex-valued problems as non-convex real-valued ones. We show that an alternating semidefinite programming algorithm can be applied and is not sensitive to the start point without the sacrifices of accuracy. Furthermore, it performs well even when the voltage angles are not close to zero. Convergence analysis is provided, and numerical studies on representative power systems datasets demonstrate the accuracy of our proposed algorithm, and applicability on various scenarios of different given measurements.