Compiler Optimizations for Java Aglets in Distributed Data Intensive Applications
Code migration in the light of distributed data intensive computing poses interesting compiling issues. In this work, we first define a small extension to the aglet model to allow data distribution. In our aglet program , data are distributed over the network using annotations (this is similar to HPF where the programmer specifies data distributions through annotations). We analyze the program using annotations and data sizes and use the owner computes rule to determine where a given computation should take place . Our compiler infrastructure called Compiler Scheduler(CS) then schedules the aglet through the network. We propose two strategies to optimize the aglet schedule. The first strategy called Take All Live Data (TALD) attempts to carry all the live definitions of variables from a given node when visited. The second strategy Take Only Needed Data (TOND) attempts to carry only those definitions whose uses are in the destination node.The goal of the first strategy is to minimize the number of migrations. Migrations are expensive because the serialization of data encountered in each migration can be in the order of milliseconds. The second strategy aims to minimize bandwidth consumption during a migration. This could significantly reduce the communication overhead due to minimal amount of data carried during each migration. We have developed our (compiler scheduler) infrastructure by implementing both the strategies in the Jikes compiler from IBM. We have evaluated it on a distributed database application and show benefits of both the strategies on large and small databases. We have also evaluated our strategies against typical distributed operations on data such as Gather, Fusion, Consistency Check and MergeSort and compare our schedules against randomized ones. The results show that strategies generated by our compiler infrastructure out preform random strategies.