In 1995, they released the first JDK, which was wholely interpreted with no MP support. In 1996, JITs were added. In 1997, the 1.1.3 JDK provided native OS threads support.
Servers are driving the most recent developments - these require millions of objects, thousands of threads, continuous operations, must be scalable, and Java extensible. Currently, the JDK is 2-5 times slower than C code. Some C and C++ optimizations are coming in JITs, but there are tradeoffs because some kinds of optimizations can take longer than than the code runtime without it. Coming soon are:
The Solaris threads implementation is unique in that it uses the many-to-many model; other vendors use the many-to-one or one-to-one implementations, which may not scale as well. And, the many-to-many model provides better GC with thread suspension.
Garbage Collection can seriously impact scalability. Long-running systems must reach equilibrium - this means that the same amount of memory is being reaped as is being sowed. The current GC algorithm is conservative; it must pass over all memory. In addition, it is handles-based, which is also slower. To remove the handles, the VM must track all pointers and be able to relocate pointers as required by the GC. This will require JNI 1.1 on Solaris. The idea is to use young and old heaps, moving objects to the old heap as they mature and appear to remain in-use over time.