• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Elf: Efficient lightweight fast stream processing at scale

    Thumbnail
    View/Open
    HU-DISSERTATION-2016.pdf (1.504Mb)
    Date
    2016-05-10
    Author
    Hu, Liting
    Metadata
    Show full item record
    Abstract
    Large Internet companies like Facebook, Amazon, and Twitter are increasingly recognizing the value of stream data processing, using tools like Flume, Muppet, or Storm to continuously collect and process incoming data in real time to help govern company activities. Applications include monitoring marketing streams for business-critical decisions, identifying spam campaigns from social network streams, datacenter's intrusion detection and troubleshooting, and others. Technical challenges for stream processing include the following: how to scale to numerous, concurrently running streaming jobs, to coordinate across those jobs to share insights, to make online changes to job functions to adapt to new requirements or data characteristics, and for each job, to efficiently operate over different time windows. This dissertation presents a new stream processing model, termed ELF, which addresses these new challenges. ELF proposes a novel decentralized "many masters many workers'' architecture implemented over a set of agents enriching the web tier of datacenter systems. ELF uses a DHT protocol to assign the jobs respective sets of master/workers mapping to the agents of the web tier, where for each job, the live data streams generated by webservers are first divided into mini-batches, then inserted and aggregated as space-efficient compressed buffer trees (CBTs) in local agents' memories. Second, per-batch results are `flushed' from CBTs, to be rolled up and aggregated via shared reducer trees (SRTs), in ways that naturally balance SRT-induced load, reduce processing latencies, and allow online job changes along with cross-job coordination. An ELF prototype implemented and evaluated for a larger scale configuration demonstrates scalability, high per-node throughput, sub-second job latency, and sub-second ability to adjust the actions of jobs being run.
    URI
    http://hdl.handle.net/1853/55576
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology