Tuesday, September 23, 2014

Garbage Collection in Java

Unlike C Language, the Java allocates and de-allocates the memory automatically. De-allocation is done by garbage collector. In this post, we focus on the de-allocation (Garbage collection). Automatic garbage collection is the process of identifying the objects which are in use, then remove the unused objects and compact the memory.

The garbage collection is done by phases
  • Mark: This is the process of identifying the objects which are in use and which are not
  • Sweep: De-allocating the objects which are not in use.
  • Compact: This is to improve the performance of allocation and de-allocation. After de-allocation, the objects may spread across the memory. The compact phase brings all the referenced objects together to create the empty space at one side.
Next question that rises is, how GC knows which objects are live. If there is a reference to the still open, then it is classified as Live object (Reference Object as in the picture). Reference means referred by the program or referred by the other memory unit (Like objects in Young generation may referred by the objects in Old generation). In this case, Old generation has a fixed memory length called "card table". This card table contains the reference of the objects in Young generation which are being referred by Old generation, then GC just looks at the card table to determine the Live Object reference from Old Generation.

Garbage Collectors

There are few types of garbage collectors which are evolved over the time.  

Serial GC

Serial GC is a very old GC which can be used with the machines with single CPU. It pauses the application while going through the phases of Mark, Sweep and Compact. This GC is not performant, so may result in loosing the throughput of the application. This is the default GC on all the single CPU machines. 
Command line flag for using this GC is -XX:+UseSerialGC

Parallel GC

As the name indicates, multiple GC threads runs during the garbage collection. The number of threads created for garbage collection are equal to the number of CPUs. If there is only one CPU, its equal to the Serial GC. The number of threads can be controlled using the command line switch : -XX:ParallelGCThreads=<no_of_threads>. It's also called as "Throughput GC" as the garbage collection is done in parallel. 
The command line switch to enable the GC is : -XX:+UseParallelGC. By default, parallel threads are created for Young Generation GC, but only one thread for Old Generation GC. If we would like to add multiple threads for the Old Generation, use the command line switch : -XX:+UseParallelOldGC to enable Parallel GC with Multiple Old Generation threads (This to be used independently, not in conjunction with Parallel GC)

CMS Collector

Abbreviated to Concurrent mark sweep collector. It doesn't have an option to compact the memory after sweeping. Moreover its runs in parallel with application thread(s). 
It goes through the following phases
  • Initial Mark: First marks the objects which are very close to the class loader so the pause time of the application will be very small. 
  • Concurrent Mark: The objects referenced by the surviving objects that have just been confirmed are tracked and checked.
  • Re-mark: This step re-checks the objects which are marked in Concurrent Mark.
  • Concurrent Sweep: Here, the un-referred objects are collected to complete the garbage collection process.
Points to remember
  • All the steps runs in parallel with the application threads except "Initial Mark" step
  • After Sweep, no compaction is done to bring all the live objects together. To allocate the bigger objects in this GC, allocate more Heap because memory may not be sufficient as compaction is not done
  • This GC is used with time critical and performance required applications. 
The command line option to request the GC is : -XX:+UseConcMarkSweepGC

G1 GC

G1 GC is officially released with Java7. It was also there in Java6, but for only test purpose. In the process of the G1 Collector, we don't see the memory moving from Young to Old Generation. As shown below, the memory is allocated in blocks. Once block is full, the memory is allocated in the next block and GC will run. This is full time replacement for the CMS Collector. G1 is faster than any other type of GCs we have seen so far.

The command line to enable the GC is : -XX:+UseG1GC. To read more about G1 GC, follow the link

Happy Learning