Efficient and refinable attack investigation
MetadataShow full item record
As modern attacks become more stealthy and persistent, detecting or preventing them at their early stages becomes virtually impossible. Instead, an attack investigation or provenance system aims to continuously monitor and log interesting system events with minimal overhead. Later, if the system observes any anomalous behavior, it analyzes the log to identify who initiated the attack and which resources were affected by the attack and then assess and recover from any damage incurred. However, because of a fundamental tradeoff between log granularity and system performance, existing systems typically record system- call events without detailed program-level activities (e.g., memory operation) required for accurately reconstructing attack causality or demand that every monitored program be instrumented to provide program-level information. In this thesis, I will present my research focusing on addressing this issue. First, I will present a Refinable Attack INvestigation system (RAIN) based on a record-replay technology that records system-call events during runtime and performs instruction-level dynamic information flow tracking (DIFT) during on-demand process replay. Instead of replaying every process with DIFT, RAIN conducts system-call-level reachability analysis to filter out unrelated processes and to minimize the number of processes to be replayed, making inter-process DIFT feasible. Second, I will present a data flow tagging and tracking mechanism, called RTAG, which further enables practical cross-host attack investigations. RTAG allows lazy synchronization between independent and parallel DIFT instances of different hosts, and applies optimal tag map to minimize memory consumption. Evaluation results show RTAG is able to recover true data flows of realistic cross-host attack scenarios with low time and memory cost. Furthermore, we deployed RAIN and RTAG in the red team adversarial engagements funded by the DARPA Transparent Computing program. The data generated by our system effectively reconstructed the causality of the attacks with high accuracy, even in the presence of knowledgeable attackers.