Semeru: A Memory-Disaggregated Managed Runtime

Brief Summary

Programs written in managed languages have to deal with periodic GC, which is a typical graph workload with poor locality. This paper presents Semeru, a distributed JVM to improve managed cloud application performance in a memory-disaggregated environment. Main innovations of Semeru includes a universal Java heap, a distributed GC, and a swap system in OS kernel.

Questions:

Most datacenter applications are written in managed languages?

Introduction

Datacenter Types:

built with monolithic server, each of which tightly intergrates a small amount of each type of resources (e.g., CPU, memory, storage).
resource disaggregated datacenter: contain servers dedicated to individual resource types.

Why resource disaggregation: To improve resource utilization, failure isolation, and elasticity.

SOTA:

LegoOS: what is "loosely coupled monitors"?
InfiniSwap: a paging system that leverages RDMA

Problems:

RDMA provides efficient remote data access, but incurs microsecond-level latency. Moreover, no optimazation considers run-time sematics of the program (e.g., locality).
CPU servers has only a small amount of memory that stores recently fetched pages, and a cache miss triggers a page fault to fetch data from the memory server. Hence, programs should posses execellent spatial and/or temporal locality to reduce cache misses.
High locality creates two practical challenges for cloud applications:
- Typical cloud applications are written in managed languages that execute atop a managed runtime. However, the GC of the runtime is a graph workload that suffers from poor locality.
- Managed programs make heavy use of object-oriented data structures, where its element objects can be distributed across memory servers. Sequential scan with frequent remote fetches could incur high performance penalty.

Contributions:

Thoughts

Instead of grabage after GC, the JVM also have some live objects taht are evacuated into new regions. These object allocation information cannot be fetched just after the marking phase of the GC.
The universal Java heap (UJH) seems similar to Unified Virtual Memory in GPU.
- UJH has the page fault to swap in pages in memory server, and the eviction operation to swap out pages. Does it looks like some delegation mechanism that bypass the OS?
- two states for virtual address spaces: cached or evicted.
The assumption is one CPU server and multiple memory server. Its coherency protocol is simpler that distributed shared memory (DSM) as the memory server does not execute any mutator code. Is this a good/practical assumption?
The G1's region-based heap design is same to balanced GC policy in OpenJ9? Reason to adopt is, region-based design enables modular tracing and reclamation.
Allocation and Cache management: ...
TODO: Compare what is the diff between the balanced GC and Semeru's GC!
A potential problem is, before evacuation happens on memory server, it must ensure that regions to be evacuated have all their pages evicted to avoid inconsistency. Even though the region selection based on the ratio of evicted pages and the age, it still can be a problem.
For the task 3 of CSSC, after the CPU server finishes tracing, it has to write new regions back to their respective memory servers. Why not keep something that is potentially hot to reduce traffic?
For rare cases, Semeru runs a full-heap scan that brings all objects into the cache for tracing and collection. Why not leverage the CPU server and the memory server to scan together?
Why CSSC cannot reclaim dead objects in different regions and form cycles? What does it mean?
For the swapping system, why page fetching and evictions should go through a data path inside the kernel? What is "provide transparency to applications"?

Links

[website], [paper], [slides], [github]

PreviousOSDI'20 NextOSDI'24

Last updated 1 year ago

hashtagBrief Summary

hashtagIntroduction

hashtagThoughts

hashtagLinks

Brief Summary

Introduction

Thoughts

Links