Application of Information Theory and Learning to Network and Biological Tomography
MetadataShow full item record
Studying the internal characteristics of a network using measurements obtained from endhosts is known as network tomography. The foremost challenge in measurement-based approaches is the large size of a network, where only a subset of measurements can be obtained because of the inaccessibility of the entire network. As the network becomes larger, a question arises as to how rapidly the monitoring resources (number of measurements or number of samples) must grow to obtain a desired monitoring accuracy. Our work studies the scalability of the measurements with respect to the size of the network. We investigate the issues of scalability and performance evaluation in IP networks, specifically focusing on fault and congestion diagnosis. We formulate network monitoring as a machine learning problem using probabilistic graphical models that infer network states using path-based measurements. We consider the theoretical and practical management resources needed to reliably diagnose congested/faulty network elements and provide fundamental limits on the relationships between the number of probe packets, the size of the network, and the ability to accurately diagnose such network elements. We derive lower bounds on the average number of probes per edge using the variational inference technique proposed in the context of graphical models under noisy probe measurements, and then propose an entropy lower (EL) bound by drawing similarities between the coding problem over a binary symmetric channel and the diagnosis problem. Our investigation is supported by simulation results. For the congestion diagnosis case, we propose a solution based on decoding linear error control codes on a binary symmetric channel for various probing experiments. To identify the congested nodes, we construct a graphical model, and infer congestion using the belief propagation algorithm. In the second part of the work, we focus on the development of methods to automatically analyze the information contained in electron tomograms, which is a major challenge since tomograms are extremely noisy. Advances in automated data acquisition in electron tomography have led to an explosion in the amount of data that can be obtained about the spatial architecture of a variety of biologically and medically relevant objects with sizes in the range of 10-1000 nm A fundamental step in the statistical inference of large amounts of data is to segment relevant 3D features in cellular tomograms. Procedures for segmentation must work robustly and rapidly in spite of the low signal-to-noise ratios inherent in biological electron microscopy. This work evaluates various denoising techniques and then extracts relevant features of biological interest in tomograms of HIV-1 in infected human macrophages and Bdellovibrio bacterial tomograms recorded at room and cryogenic temperatures. Our approach represents an important step in automating the efficient extraction of useful information from large datasets in biological tomography and in speeding up the process of reducing gigabyte-sized tomograms to relevant byte-sized data. Next, we investigate automatic techniques for segmentation and quantitative analysis of mitochondria in MNT-1 cells imaged using ion-abrasion scanning electron microscope, and tomograms of Liposomal Doxorubicin formulations (Doxil), an anticancer nanodrug, imaged at cryogenic temperatures. A machine learning approach is formulated that exploits texture features, and joint image block-wise classification and segmentation is performed by histogram matching using a nearest neighbor classifier and chi-squared statistic as a distance measure.