Introduction
I decided to combine two software loves
of mine and perform some analysis on PCGen,
a popular Java based open source character generator for role-playing
games. I used Censum,
our (jClarity's)
new Garbage Collection log analysis tool to perform the analysis.
This write-up assumes you have a
passing familiarity with Garbage Collection (GC) on the JVM. If
you're not familiar with GC then I recommend you join our Friends
of jClarity programme. We're building up a knowledge
base around GC to share freely with the whole Java community, we'd
love for you to come and validate this!
The projects
The two projects I'm using are are PCGen (the project I'm doing the analysis on) and Censum (the GC analysis tool).
PCGen
PCGen is a popular character generator and maintenance program for d20 role playing systems such as Star Wars and Dungeons & Dragons. It's a long running project (> 10 years) and consists of a large (~750,000 LOC) Java Swing desktop tool that has a ton of proprietary format data files. Disclaimer: I'm the Chairperson for PCGen.
PCGen is a data
intensive tool. In order to drive its rules engine and to meet the
requirement of a responsive UI (with lots of detail) much of this
data is loaded up front and held in memory. Users have reported the
following issues in the past.
- Pauses often occur when dealing with multiple characters and/or high level characters.
- When creating a character of a high level or if more than 2 characters were created, PCGen simply died. More tech savvy users report that they saw an OutOfMemoryError in the pcgen log file.
Some work has been
done to mitigate this poor performance in the latest version of PCGen
(6.0) and so I decided to use Censum to determine if those changes
had improved matters.
Censum
Censum is jClarity's new Garbage Collection analysis tool. It's focus is to use powerful analytics to crunch through the raw log data and give busy developers (like contributors to PCGen!) plain english answers quickly. Disclaimer: I'm the CTO of jClarity.
Censum is a new product that is free
for Open Source projects and of course if you wish to
purchase a copy you can get a free trial
license (click on Start Your Free Trial) today!
TLDR - The Conclusion
We have good news, some information and bad news.
The Good News
The positive news is that the default heap settings that PCGen starts with (-Xms256m -Xmx512m) are now adequate in terms of being sized well enough to keep PCGen running. Even after creating a 5th complex character, there was no OutOfMemoryError.
Censum shows that once a full GC is run
(typically after each new character has been created), a large
percentage of the heap is recovered and each the character takes up
about 25-50MB of heap space. We can very roughly extrapolate that
with a starting (data loaded) point of ~125MB that PCGen can
comfortably hold about 10-15 characters open at any one time without
crashing. This is perhaps not enough for a GM to have his Goblin
horde up and running, but certainly enough for most regular parties!
The Bad News
The slightly more negative news is Censum reporting that PCGen has relatively high pause times, likely triggered by too much premature promotion.
Too much premature promotion pushes
memory into the old gen space more quickly that we'd like. This can
have the knock-on effect of causing more old gen collections as well
as full GCs, naturally leading to more pause times.
See the full analysis section for extra
details on high pause times, premature promotion and what PCGen can
do about it.
Where to from here?
PCGen could follow Censum's "stop-gap" recommendation to alter the size of the young generational space. By using the -XX:NewSize flag and setting that to ~256M, the high pause times problem is alleviated.
However, the longer term solution is
for PCGen to continue reducing the impact of their data structures
(in particular the representation of a player character). This is in
fact an ongoing project for PCGen today!
The technical setup
PCGen is typically run from a shell
script with the default heap settings of -Xms256m
and -Xmx512m. The script was
altered to provide the minimum set of flags that are required to
produce GC log that can be analysed. The flags that were added to the
java command were:
-verbose:gc
-Xloggc:pcgen_gc.log -XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
- -verbose:gc and -Xloggc:pcgen_gc.log produce a basic log that outputs to a file called pcgen_gc.log.
- -XX:+PrintGCDetails provides the absolute minimum set of GC allocation information that Censum needs to perform an analysis.
- Finally -XX:+PrintTenuringDistribution gives useful information on when objects are moving from a young generational space (eden, survivor 1 and survivor 2) to an old generation space (tenured).
All of these options have little to no
impact on a running JVM. You should always have these switched on in
Production!
PCGen was run with Oracle's Java 7u10
on a MBP running Mac OS 10.7.5, with 8GB of RAM, a 256MB SSD drive
and a hyperthreaded Dual Core 2.8Ghz i7 processor.
The PCGen activities
PCGen begins by loading up basic game
mode and system files + the basic UI to load data sources. The next
step is for the user to select which data sources to load
(roleplaying game rule sets). The popular SRD 3.5 with Monsters
set was loaded (Dungeons and Dragons 3.5e).
A Character (Karianna) was
created level by level into a 20th Wizard with fully loaded spells,
equipment, and a Cat familiar (effectively a 2nd character in PCGen).
Several more complex characters were added after that, including a
Great Wyrm Blue Dragon (loads of data!).
Analysis
I'll cover the initial data loading use case and then general on going usage (character creation).
Data Loading
I was curious about the memory impact of the initial data load. Although I get fast loading times with my SSD drive having PCGen load its data without memory problems is certainly a project goal! Here is what Censum showed in terms of heap allocation after GC's.
As you can see, the initial load of
data caused a number of young generational collections and one old
(tenured) GC event at the end of the data load. The heap usage
climbed to a max of about 325MB,
but after garbage was collected, the heap usage fell back to about
100MB. Not too bad for loading
about 15 thick rule books worth of data!
However, data loading for PCGen is a
little like the start-up period for an web/application server such as
Tomcat. In terms of your GC analysis it's generally best to be
excluded as part of a one-off start-up as opposed to analysing your
running application.
Creating characters
Creating Karianna and advancing her to 20th level involves filling in details of 13 main tabs ~20 sub tabs and a good deal of data! Another 4 characters were created of similar complexity, some friendly (a Cat familiar) and some not (a Great Wyrm Blue Dragon).
A few screenshots of the process
follow:
Skills
Equipment
The Embedded Character Sheet
Censum's Analysis
On opening the log file, Censum took me
immediately to its Analytics Summary screen, which lets me know at a
glance as to how PCGen's garbage collection is going.
The good news
Immediately I know that:
- I have the right flags to collect enough GC data for analytics to work properly
- My GC throughput is good (PCGen spends it's time running, not garbage collecting)
- There are no nasty System.gc() calls (generally not good practice).
The informational news
Memory Utilisation (which a memory leak is a subset of) and Memory Pool Sizes are informational as the log has not yet gathered 24 hours of data (our recommended minimum to see a full working day's cycle for an application).
The bad news
PCGen appears to have high pause times as well as a premature promotion problem. Let's dive into those a bit further!
High Pause Times
High pause times are caused by GC
collectors having to pause application threads in order to clean up
object references in a safe manner. The more object references that
the collectors have to scan over and clean up, the longer the pause!
The longest pauses are usually caused by full GC collections, where
the entire Java Heap (e.g. Both young and old gen spaces are getting
really full) are cleaned up.
As a user I noticed pauses a couple of
times, not enough to really disturb me, but I am aware that I have
extremely good hardware and that these pause times may be
significantly worse for other users.
As Censum points out, the 0.15% time
spent paused is not a major concern, it's the 0.666 second pause time
that's concerning. However, I remembered that the highest pause time
could come from the initial data load in PCGen. To correlate this,
Censum provides a graph of pause times.
The data load was the worst offender,
but certainly for each character created there was a good ½ second
pause around each character creation point due to a full GC. Again ½
a second wasn't too annoying for me in the context of using PCGen,
but as Censum shows, full GCs take time and so PCGen should look to
reduce the number of full GC's.
In this case we know that we are
probably getting more full GC's than we'd like due to the other
warning that Censum gives us - too much premature promotion.
Premature Promotion
Premature promotion basically means
that objects that should be getting collected in the young gen space
are being promoted into the old gen space before their age is up.
This 'age' is known as the tenuring threshold, and is based on a
combination of software research done in the 1980's and the JVM's
runtime heuristics. Premature promotion can occur due to:
- The rate of new objects being created overwhelming the young gen space
- The size of the objects being created are too large to fit into the young gen space. e.g. Large contiguous blocks of memory.
This has a knock on effect of putting
pressure on the old gen space. It fills up more quickly and therefore
more old gen collections and eventually full GCs occur, leading to
more frequent pause times.
When I go to take a look and see how
soon objects should be promoted and how early they are getting
promoted, I get an answer immediately. The Tenuring Summary screen
shows us that the Tenuring threshold is set to 15 (objects can
survive ~15 collections in young gen before being promoted naturally
to old gen). Note also that 100% of objects are being prematurely
promoted!
I then go and look and see when objects
are being promoted. Censum tells me that as majority are being
promoted almost as soon as they arrive in young gen (Tenuring
Threshold of 1). That value indicates to me that the arriving objects
are too large for the current young gen size. However, I double check
and make sure object allocation isn't extremely high, Censum tells me
that object allocation peaks at about 2.5GB/s, which is only about ⅔
of the capability of my MBP, so we're OK there.
Steps for PCGen to take
The next step for the PCGen team would
be to take a look at something like the histogram from jmap
to see what large objects are being created. A quick check of jmap
output means that I know that we have large (multi-MB) contiguous
arrays of String, char
and byte.
An option for PCGen at this point is to
resize the young gen space in order to fit these large objects into
young gen space using the -XX:NewRatio
or -XX:NewSize flags. The size
to set either of these two values would depend on the results
discovered from the jmap
usage. In this case setting -XX:NewSize
to ~256M is required to alleviate the the number of pauses.
However, that's generally not a great
long term solution. The PCGen project is better off reducing the
usage of these large contiguous memory allocations. Lead by Tom
Parker and James Dempsey, this is exactly what the PCGen team is
doing, taking out large memory structures from the monolithic
PlayerCharacter class into
smaller more flexible facets.
Summary
It's fantastic to see that the
incredibly hard work the PCGen volunteers have put in for the 6.0
release has paid real dividends for the end users!
A couple of years ago I was tracking
down a 30+ second pause that a PCGen user had reported when loading
up her 25th level Kobold Shaman. It took me several days to track
down that it was a GC/memory problem as I only had raw GC log files
(which are multithreaded, indeterminant beasts). Going forwards I'll
definitely be using tooling like Censum to solve these issues more
quickly, which means I can get on with adding new features to PCGen!
Don't forget to check out PCGen
for your d20 role-playing game needs and Censum
if you want to solve thorny GC problems!
Cheers,
Martijn (Chairperson - PCGen, CTO -
jClarity)
aka "The Diabolical Developer"
I understand this is a blog post meant to shill Censum - but your point that being able to tune GC using only GC log files is totally wrong... especially considering the fact that you feed GC log files INTO censum.
ReplyDeleteThe free open source GCViewer doesn't have the same "wizards" that Censum has, but it does provide a very good summary of the pause times from a GC log and it makes it very easy to tune an app's GC performance.
With the results I've been seeing as we've been moving towards using G1 in production with our apps, G1 may make a lot of GC tuning a thing of the past (so by the time Censum would really hit traction in the market, G1 might eliminate the need for such a tool)
Hi Ryan,
DeleteThanks for commenting!
Log vs Realtime processing:
------------------------
Just to be clear - you can feed the log into a number of tools or even just read it yourself if you have the time and the experience. It certainly doesn't have to be fed into Censum!
You're quite correct that post processing a log is not the only way to determine how to tune a GC. But trying to tune 'live' or in real time GC information has several disadvantages:
* Connecting to a live running JVM using the GC MXBeans actually creates its own objects (and therefore garbage) impacting the very application you're trying to tune.
* The MXBeans don't give you as much information as the logs can (it's a limited API).
* Real time analysis only gives you a short window for analysis, so you don't see the full picture of how the memory in your application is behaving over it's natural cycle (e.g. A payroll application over the period of a day's work, or even a week given the Friday afternoon timesheet rush). Tuning based only on partial data can cause major problems for your application.
Other tools vs Censum:
--------------------
GCViewer is pretty decent, as is HPJmeter - but when we polled large numbers of developers and operations folk, we discovered that a majority of them didn't have the time to learn how to interpret the results that those tools give out. They were also often dealing with an emergency production issue and wanted the tool to tell them exactly where the problem was and what to do about it.
We're trying very hard with Censum to give that definitive answer (as much as is mathematically possible) and make a solid recommendation (e.g. Set your NewSize=X). We still have some ways to go before we can recommend in all cases, but Censum is constantly learning and we're confident the accuracy and range of analytics will continue to improve as the machine learning and modelling we use gets more data.
G1:
---
Your comments here are really interesting - we haven't heard those levels of confidence about G1 from anyone, even including the Oracle guys themselves who wrote it! G1 has only just gain official blessing from Oracle as a supported GC - we'll be watching it with interest but we have our doubts that it will 'automatically' fix everything. Not even Azul claims that their pauseless C4 GC does that :-).
I noticed the GCViewer actually has some G1 support in it - so I guess it's still of need/interest to analyse G1 and tune it.
OpenJDK and the Java community at large could really benefit from your findings, please do post them to the GC and hotspot mailing lists! If G1 can be proven to be in almost all cases better than existing collectors, that's a great leap forward for everyone.
Thanks again and if you'd like an eval copy of Censum, let me know :-).
It looks like you are loading a lot of those text-based config files and holding on to them. If you were more aggressive about removing references to the text config files in memory, that should help...?
ReplyDeleteHi Will,
DeleteThanks for your comment. So the code very roughly:
1 Has a handle on the text file (a .lst proprietary format)
2 Loads all of the data into memory
3 Updates a Character Document Object Model (CDOM) with the data
4 Keeps some data cached as pure reference data (like a database table with static data, but in memory)
5 Releases the handle to the text file
At this stage all of the data is needed to be kept in memory in order for the CDOM to correctly model the character as it gains and loses levels, as well as having classes, abilities, templates, feats, spells, equipment and much more change its stats!
So it's hard to see what we could remove in terms of holding references - but we certainly can reduce the size of the overal CDOM per character as well as the static data...
It's a fun engineering effort for the team!
Does your censum tool analyze logs from IBM JVM?
ReplyDeleteUnfortunately not no, we'll likely plan for this next year.
ReplyDelete