Consuming stack traces noticeably slower in Java 11 than Java 8












23















I was comparing the performance of JDK 8 and 11 using jmh 1.21 when I ran across some surprising numbers:



Java version: 1.8.0_192, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op


Java version: 9.0.4, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op


Java version: 10.0.2, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op


Java version: 11.0.1, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op


OpenJDK 11 and 12 perform similar to OracleJDK 11. I have omitted their numbers for the sake of brevity.



I understand that microbenchmarks do not indicate the performance behavior of real-life applications. Still, I'm curious where this difference is coming from. Any ideas?





Here is the benchmark in its entirety:



pom.xml:



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>

<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>


src/main/java/jmh/MyBenchmark.java:



package jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}


Here is the Windows-specific script I use. It should be trivial to translate it to other platforms:



set JAVA_HOME=C:Program FilesJavajdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavaoracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar


My runtime environment is:



Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:Program Filesapache-maven-3.6.0bin..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"


More specifically, I am running Microsoft Windows [Version 10.0.17763.195].










share|improve this question




















  • 1





    It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

    – Naman
    Jan 2 at 16:03








  • 3





    Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

    – Aleksey Shipilev
    Jan 2 at 18:16











  • @AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

    – Jorn Vernee
    Jan 2 at 18:36






  • 2





    @JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

    – Aleksey Shipilev
    Jan 2 at 19:10








  • 1





    Yup, see my answer.

    – Aleksey Shipilev
    Jan 2 at 23:56
















23















I was comparing the performance of JDK 8 and 11 using jmh 1.21 when I ran across some surprising numbers:



Java version: 1.8.0_192, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op


Java version: 9.0.4, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op


Java version: 10.0.2, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op


Java version: 11.0.1, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op


OpenJDK 11 and 12 perform similar to OracleJDK 11. I have omitted their numbers for the sake of brevity.



I understand that microbenchmarks do not indicate the performance behavior of real-life applications. Still, I'm curious where this difference is coming from. Any ideas?





Here is the benchmark in its entirety:



pom.xml:



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>

<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>


src/main/java/jmh/MyBenchmark.java:



package jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}


Here is the Windows-specific script I use. It should be trivial to translate it to other platforms:



set JAVA_HOME=C:Program FilesJavajdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavaoracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar


My runtime environment is:



Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:Program Filesapache-maven-3.6.0bin..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"


More specifically, I am running Microsoft Windows [Version 10.0.17763.195].










share|improve this question




















  • 1





    It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

    – Naman
    Jan 2 at 16:03








  • 3





    Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

    – Aleksey Shipilev
    Jan 2 at 18:16











  • @AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

    – Jorn Vernee
    Jan 2 at 18:36






  • 2





    @JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

    – Aleksey Shipilev
    Jan 2 at 19:10








  • 1





    Yup, see my answer.

    – Aleksey Shipilev
    Jan 2 at 23:56














23












23








23


5






I was comparing the performance of JDK 8 and 11 using jmh 1.21 when I ran across some surprising numbers:



Java version: 1.8.0_192, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op


Java version: 9.0.4, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op


Java version: 10.0.2, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op


Java version: 11.0.1, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op


OpenJDK 11 and 12 perform similar to OracleJDK 11. I have omitted their numbers for the sake of brevity.



I understand that microbenchmarks do not indicate the performance behavior of real-life applications. Still, I'm curious where this difference is coming from. Any ideas?





Here is the benchmark in its entirety:



pom.xml:



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>

<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>


src/main/java/jmh/MyBenchmark.java:



package jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}


Here is the Windows-specific script I use. It should be trivial to translate it to other platforms:



set JAVA_HOME=C:Program FilesJavajdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavaoracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar


My runtime environment is:



Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:Program Filesapache-maven-3.6.0bin..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"


More specifically, I am running Microsoft Windows [Version 10.0.17763.195].










share|improve this question
















I was comparing the performance of JDK 8 and 11 using jmh 1.21 when I ran across some surprising numbers:



Java version: 1.8.0_192, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op


Java version: 9.0.4, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op


Java version: 10.0.2, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op


Java version: 11.0.1, vendor: Oracle Corporation

Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op


OpenJDK 11 and 12 perform similar to OracleJDK 11. I have omitted their numbers for the sake of brevity.



I understand that microbenchmarks do not indicate the performance behavior of real-life applications. Still, I'm curious where this difference is coming from. Any ideas?





Here is the benchmark in its entirety:



pom.xml:



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>

<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>


src/main/java/jmh/MyBenchmark.java:



package jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}


Here is the Windows-specific script I use. It should be trivial to translate it to other platforms:



set JAVA_HOME=C:Program FilesJavajdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavajdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar

set JAVA_HOME=C:Program FilesJavaoracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%binjava" -jar targetbenchmarks.jar


My runtime environment is:



Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:Program Filesapache-maven-3.6.0bin..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"


More specifically, I am running Microsoft Windows [Version 10.0.17763.195].







performance java-8 java-11 jmh






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 20 at 11:00









Brian Tompsett - 汤莱恩

4,2421339102




4,2421339102










asked Jan 2 at 15:15









GiliGili

42.2k63277529




42.2k63277529








  • 1





    It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

    – Naman
    Jan 2 at 16:03








  • 3





    Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

    – Aleksey Shipilev
    Jan 2 at 18:16











  • @AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

    – Jorn Vernee
    Jan 2 at 18:36






  • 2





    @JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

    – Aleksey Shipilev
    Jan 2 at 19:10








  • 1





    Yup, see my answer.

    – Aleksey Shipilev
    Jan 2 at 23:56














  • 1





    It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

    – Naman
    Jan 2 at 16:03








  • 3





    Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

    – Aleksey Shipilev
    Jan 2 at 18:16











  • @AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

    – Jorn Vernee
    Jan 2 at 18:36






  • 2





    @JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

    – Aleksey Shipilev
    Jan 2 at 19:10








  • 1





    Yup, see my answer.

    – Aleksey Shipilev
    Jan 2 at 23:56








1




1





It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

– Naman
Jan 2 at 16:03







It would also be good if you could compare it with the results of a case when avoiding to printStackTrace as pointed out by Alan

– Naman
Jan 2 at 16:03






3




3





Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

– Aleksey Shipilev
Jan 2 at 18:16





Educated guess: if we profile this thing, it would show up as bugs.openjdk.java.net/browse/JDK-8151751

– Aleksey Shipilev
Jan 2 at 18:16













@AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

– Jorn Vernee
Jan 2 at 18:36





@AlekseyShipilev bugs.openjdk.java.net/browse/JDK-8150778 is targeted at 9, so that could explain the 8 -> 9 regression, but not the 10 -> 11 regression right?

– Jorn Vernee
Jan 2 at 18:36




2




2





@JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

– Aleksey Shipilev
Jan 2 at 19:10







@JornVernee: Further educated guess: 8->9 regression is switching to StackWalker, which ends up interning lots of strings and thus StringTable is the bottleneck; and 10->11 is switching StringTable to concurrent hash table inside VM. I'd suspect JDK-8151751 would handle both...

– Aleksey Shipilev
Jan 2 at 19:10






1




1





Yup, see my answer.

– Aleksey Shipilev
Jan 2 at 23:56





Yup, see my answer.

– Aleksey Shipilev
Jan 2 at 23:56












2 Answers
2






active

oldest

votes


















29














I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.



As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.



But the most interesting part is the slowdown in JDK 11.

Here is what async-profiler shows (clickable SVG).



JDK 10



JDK 11



The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.



Let's zoom in:



JDK 11 zoom in



Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.



stringTable.cpp (JDK 11)



oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;

if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}


The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.



In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:



  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);


This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".




TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.







share|improve this answer





















  • 4





    Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

    – Claes Redestad
    Jan 3 at 9:56



















25














I suspect this is due to several changes.



8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)



And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).



Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.



Watch this:



8u191: 15108 ± 99 ns/op [so far so good]



-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci


"head": 22382 ± 134 ns/op [regression]



-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573


"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]



-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle





share|improve this answer





















  • 1





    jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

    – Gili
    Jan 3 at 0:06








  • 4





    This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

    – Aleksey Shipilev
    Jan 3 at 0:15








  • 1





    JDK-8195100 and friends aren't in 11, though.

    – Claes Redestad
    Jan 3 at 0:28











  • @ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

    – Aleksey Shipilev
    Jan 3 at 0:31






  • 2





    +1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

    – apangin
    Jan 3 at 3:22











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54008778%2fconsuming-stack-traces-noticeably-slower-in-java-11-than-java-8%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









29














I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.



As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.



But the most interesting part is the slowdown in JDK 11.

Here is what async-profiler shows (clickable SVG).



JDK 10



JDK 11



The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.



Let's zoom in:



JDK 11 zoom in



Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.



stringTable.cpp (JDK 11)



oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;

if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}


The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.



In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:



  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);


This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".




TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.







share|improve this answer





















  • 4





    Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

    – Claes Redestad
    Jan 3 at 9:56
















29














I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.



As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.



But the most interesting part is the slowdown in JDK 11.

Here is what async-profiler shows (clickable SVG).



JDK 10



JDK 11



The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.



Let's zoom in:



JDK 11 zoom in



Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.



stringTable.cpp (JDK 11)



oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;

if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}


The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.



In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:



  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);


This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".




TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.







share|improve this answer





















  • 4





    Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

    – Claes Redestad
    Jan 3 at 9:56














29












29








29







I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.



As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.



But the most interesting part is the slowdown in JDK 11.

Here is what async-profiler shows (clickable SVG).



JDK 10



JDK 11



The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.



Let's zoom in:



JDK 11 zoom in



Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.



stringTable.cpp (JDK 11)



oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;

if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}


The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.



In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:



  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);


This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".




TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.







share|improve this answer















I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.



As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.



But the most interesting part is the slowdown in JDK 11.

Here is what async-profiler shows (clickable SVG).



JDK 10



JDK 11



The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.



Let's zoom in:



JDK 11 zoom in



Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.



stringTable.cpp (JDK 11)



oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;

if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}


The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.



In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:



  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);


This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".




TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.








share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 3:06

























answered Jan 3 at 2:42









apanginapangin

52.8k8107137




52.8k8107137








  • 4





    Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

    – Claes Redestad
    Jan 3 at 9:56














  • 4





    Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

    – Claes Redestad
    Jan 3 at 9:56








4




4





Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

– Claes Redestad
Jan 3 at 9:56





Nice! I did a quick experiment and can get 1.33x speedup in the microbenchmark by adding back in a lookup using the jchar* before calling do_intern, filed bugs.openjdk.java.net/browse/JDK-8216049 referencing your answer.

– Claes Redestad
Jan 3 at 9:56













25














I suspect this is due to several changes.



8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)



And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).



Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.



Watch this:



8u191: 15108 ± 99 ns/op [so far so good]



-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci


"head": 22382 ± 134 ns/op [regression]



-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573


"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]



-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle





share|improve this answer





















  • 1





    jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

    – Gili
    Jan 3 at 0:06








  • 4





    This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

    – Aleksey Shipilev
    Jan 3 at 0:15








  • 1





    JDK-8195100 and friends aren't in 11, though.

    – Claes Redestad
    Jan 3 at 0:28











  • @ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

    – Aleksey Shipilev
    Jan 3 at 0:31






  • 2





    +1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

    – apangin
    Jan 3 at 3:22
















25














I suspect this is due to several changes.



8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)



And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).



Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.



Watch this:



8u191: 15108 ± 99 ns/op [so far so good]



-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci


"head": 22382 ± 134 ns/op [regression]



-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573


"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]



-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle





share|improve this answer





















  • 1





    jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

    – Gili
    Jan 3 at 0:06








  • 4





    This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

    – Aleksey Shipilev
    Jan 3 at 0:15








  • 1





    JDK-8195100 and friends aren't in 11, though.

    – Claes Redestad
    Jan 3 at 0:28











  • @ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

    – Aleksey Shipilev
    Jan 3 at 0:31






  • 2





    +1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

    – apangin
    Jan 3 at 3:22














25












25








25







I suspect this is due to several changes.



8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)



And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).



Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.



Watch this:



8u191: 15108 ± 99 ns/op [so far so good]



-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci


"head": 22382 ± 134 ns/op [regression]



-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573


"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]



-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle





share|improve this answer















I suspect this is due to several changes.



8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)



And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).



Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.



Watch this:



8u191: 15108 ± 99 ns/op [so far so good]



-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci


"head": 22382 ± 134 ns/op [regression]



-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573


"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]



-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 3 at 0:30

























answered Jan 2 at 23:52









Aleksey ShipilevAleksey Shipilev

14.2k24070




14.2k24070








  • 1





    jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

    – Gili
    Jan 3 at 0:06








  • 4





    This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

    – Aleksey Shipilev
    Jan 3 at 0:15








  • 1





    JDK-8195100 and friends aren't in 11, though.

    – Claes Redestad
    Jan 3 at 0:28











  • @ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

    – Aleksey Shipilev
    Jan 3 at 0:31






  • 2





    +1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

    – apangin
    Jan 3 at 3:22














  • 1





    jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

    – Gili
    Jan 3 at 0:06








  • 4





    This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

    – Aleksey Shipilev
    Jan 3 at 0:15








  • 1





    JDK-8195100 and friends aren't in 11, though.

    – Claes Redestad
    Jan 3 at 0:28











  • @ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

    – Aleksey Shipilev
    Jan 3 at 0:31






  • 2





    +1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

    – apangin
    Jan 3 at 3:22








1




1





jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

– Gili
Jan 3 at 0:06







jmh profilers are still a bit of a mystery to me to be honest. The Windows profiler hangs for a very long time at the end of every phase, and I also lack the ability to interpret ASM code as you did here.

– Gili
Jan 3 at 0:06






4




4





This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

– Aleksey Shipilev
Jan 3 at 0:15







This answer concerns the bottleneck in the native JVM code, so it used the vanilla Linux perf, which is supposed to work on any modern Linux distribution. While the precise answer does require expertise and experience, you can still get very close to the answer just by using appropriate tools, and it will also save time for others to get there. That is, asking "what are these differences in profiles?" is the level-up from asking "why performance is different?"

– Aleksey Shipilev
Jan 3 at 0:15






1




1





JDK-8195100 and friends aren't in 11, though.

– Claes Redestad
Jan 3 at 0:28





JDK-8195100 and friends aren't in 11, though.

– Claes Redestad
Jan 3 at 0:28













@ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

– Aleksey Shipilev
Jan 3 at 0:31





@ClaesRedestad: Yup, edited the answer. Maybe that's StringTable and/or CDS or something.

– Aleksey Shipilev
Jan 3 at 0:31




2




2





+1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

– apangin
Jan 3 at 3:22





+1. I think I found the root cause (was too long for comment). Your supposition about StringTable preparations was quite right.

– apangin
Jan 3 at 3:22


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54008778%2fconsuming-stack-traces-noticeably-slower-in-java-11-than-java-8%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

How to fix TextFormField cause rebuild widget in Flutter