JMeter Halts all Activity for 60 seconds

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

JMeter Halts all Activity for 60 seconds

Alistair Evans
Hi,

I'm encountering a strange problem when running our JMeter test plan.

I effectively get a halt to all JMeter activity for up to 60 seconds (a strangely round number) in the middle of the test. JMeter treats this as elapsed time when it processes the HTTP responses after the delay, so we get a spike in our response times graph (screenshot available here: https://imgur.com/ayKTzOU).

Test Description: A typical test contains 1000 users, over 2 hours, with a 15 minute ramp time on requests. Randomised delays are present between all requests.

Platform Details from JMeter Log:
Version 5.1.1 r1855137
java.version=1.8.0_211
java.vm.name=Java HotSpot(TM) 64-Bit Server VM
os.name=Windows 10
os.arch=amd64
os.version=10.0
file.encoding=Cp1252
java.awt.headless=true
Max memory     =7654604800
Available Processors =4
Default Locale=English (EN)
JMeter  Locale=English (EN)

During this window of zero activity (in these examples between 11:38:44 and 11:39:44), I have determined the following:


-          During this window there are zero HTTP requests being processed by the application servers.



-          After the delay, JMeter issues any requests whose delay timers have passed (no timers are longer than 1 or 2 seconds), causing a spike in requests.


-          Processor usage on the load source box where JMeter is running drops to almost zero during this period (you can see the perfmon capture for the test on the JMeter box here: https://imgur.com/vgVhB1X).



-          Disk queue length is also zero, which suggests it isn't a case of Windows blocking a process when it opens a file handle.


-          I've turned on GC Verbose logging, because my initial suspicion was GC STW pauses, but weirdly no GC happens at all during that period, the GC logs reports 60 seconds of application time, when normal time between GC is 1 second or less; here's an extract of the log:


Metaspace       used 53031K, capacity 83842K, committed 101364K, reserved 1112064K
  class space    used 8006K, capacity 20674K, committed 38132K, reserved 1048576K
}
[Times: user=0.19 sys=0.00, real=0.05 secs]
2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 seconds
2019-06-17T11:38:44.246+0100: 1550.967: [GC concurrent-string-deduplication, 2702.9K->1178.3K(1524.6K), avg 54.2%, 0.0038746 secs]
2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 seconds
{Heap before GC invocations=1196 (full 1):
garbage-first heap   total 7475200K, used 3054884K [0x0000000628400000, 0x0000000628607210, 0x00000007f0800000)
  region size 2048K, 256 young (524288K), 32 survivors (65536K)
Metaspace       used 53616K, capacity 85276K, committed 102388K, reserved 1114112K

-          If I turn on some of the debug logging in JMeter, (for the network activity), then JMeter does still write to the log during this period, but at a drastically reduced rate; for example, typically the PoolingHttpClientConnectionManager generates a 'Closing connections idle longer than 1 MICROSECONDS' log up to 10 times per second, but during this period it writes that message zero times.


-          Changing GC settings does have some effect on the frequency of this issue happening. With the default GC settings it happens quite often (multiple times per test). I'm currently running the following custom GC options, which reduces the frequency of the problem to about 50% probability of it happening once during a test:



-XX:+HeapDumpOnOutOfMemoryError

-Xms7299m

-Xmx7299m

-XX:MaxMetaspaceSize=256m

-XX:+UseG1GC

-XX:MaxGCPauseMillis=100

-XX:G1ReservePercent=20

-XX:+AlwaysPreTouch

-XX:+ParallelRefProcEnabled

-XX:+UseStringDeduplication

-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses

-XX:NewSize=512m

-XX:MaxNewSize=3g

-XX:NewRatio=2

-XX:MaxMetaspaceExpansion=64M



Any clues as to additional logging to further investigate, or JMeter settings to change, would be greatly appreciated.

Thanks,

Alistair

==================================================================
Alistair Evans
Chief Software Architect for Wholesale Banking
7 Layer Solutions Limited
Tel: +44 (0)1242 225520
Email: [hidden email]<mailto:[hidden email]>
Web: www.7layer.net<http://www.7layer.net/>
==================================================================


--
This email was Virus checked by UTM 9.

This email is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of 7 Layer Solutions Limited. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. If you have received this email in error please contact the sender.

7 Layer Solutions Limited registered in England and Wales, company registration number 3817853.
Registered offices: 17, St George’s Road, Cheltenham, Gloucestershire, GL50 3DT United Kingdom

Reply | Threaded
Open this post in threaded view
|

Re: JMeter Halts all Activity for 60 seconds

Felix Schumacher

Am 17.06.19 um 15:32 schrieb Alistair Evans:
> Hi,
>
> I'm encountering a strange problem when running our JMeter test plan.
>
> I effectively get a halt to all JMeter activity for up to 60 seconds (a strangely round number) in the middle of the test. JMeter treats this as elapsed time when it processes the HTTP responses after the delay, so we get a spike in our response times graph (screenshot available here: https://imgur.com/ayKTzOU).

Are you sure it is JMeter that is hanging for 60 seconds? Have you had a
look at the system under test? Maybe use another JMeter instance with
less load to look, if you get the same 60 seconds period of hangup.

Is this halt reproducible? If so take a few threaddumps at those times
(both on JMeter side and the server side).

>
> Test Description: A typical test contains 1000 users, over 2 hours, with a 15 minute ramp time on requests. Randomised delays are present between all requests.
>
> Platform Details from JMeter Log:
> Version 5.1.1 r1855137
> java.version=1.8.0_211
> java.vm.name=Java HotSpot(TM) 64-Bit Server VM
> os.name=Windows 10
> os.arch=amd64
> os.version=10.0
> file.encoding=Cp1252
> java.awt.headless=true
> Max memory     =7654604800
> Available Processors =4
> Default Locale=English (EN)
> JMeter  Locale=English (EN)
>
> During this window of zero activity (in these examples between 11:38:44 and 11:39:44), I have determined the following:
>
>
> -          During this window there are zero HTTP requests being processed by the application servers.
As written above, have a look at the state of the server, too.
>
>
> -          After the delay, JMeter issues any requests whose delay timers have passed (no timers are longer than 1 or 2 seconds), causing a spike in requests.
That is expected and a feature for JMeter waiting to be implemented
(that is handling these kind of overload gracefully and notifying the
user about such an overload period).
>
>
> -          Processor usage on the load source box where JMeter is running drops to almost zero during this period (you can see the perfmon capture for the test on the JMeter box here: https://imgur.com/vgVhB1X).
Not unsual, waiting on locks or the server doesn't need much cpu.

>
>
>
> -          Disk queue length is also zero, which suggests it isn't a case of Windows blocking a process when it opens a file handle.
>
>
> -          I've turned on GC Verbose logging, because my initial suspicion was GC STW pauses, but weirdly no GC happens at all during that period, the GC logs reports 60 seconds of application time, when normal time between GC is 1 second or less; here's an extract of the log:
>
>
> Metaspace       used 53031K, capacity 83842K, committed 101364K, reserved 1112064K
>   class space    used 8006K, capacity 20674K, committed 38132K, reserved 1048576K
> }
> [Times: user=0.19 sys=0.00, real=0.05 secs]
> 2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 seconds
> 2019-06-17T11:38:44.246+0100: 1550.967: [GC concurrent-string-deduplication, 2702.9K->1178.3K(1524.6K), avg 54.2%, 0.0038746 secs]
> 2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 seconds
> {Heap before GC invocations=1196 (full 1):
> garbage-first heap   total 7475200K, used 3054884K [0x0000000628400000, 0x0000000628607210, 0x00000007f0800000)
>   region size 2048K, 256 young (524288K), 32 survivors (65536K)
> Metaspace       used 53616K, capacity 85276K, committed 102388K, reserved 1114112K

Nice to now that GC -- or rather memory usage -- is not a problem here.

>
> -          If I turn on some of the debug logging in JMeter, (for the network activity), then JMeter does still write to the log during this period, but at a drastically reduced rate; for example, typically the PoolingHttpClientConnectionManager generates a 'Closing connections idle longer than 1 MICROSECONDS' log up to 10 times per second, but during this period it writes that message zero times.
Probably because all connections are still used waiting for the server.

>
>
> -          Changing GC settings does have some effect on the frequency of this issue happening. With the default GC settings it happens quite often (multiple times per test). I'm currently running the following custom GC options, which reduces the frequency of the problem to about 50% probability of it happening once during a test:
>
>
>
> -XX:+HeapDumpOnOutOfMemoryError
>
> -Xms7299m
>
> -Xmx7299m
>
> -XX:MaxMetaspaceSize=256m
>
> -XX:+UseG1GC
>
> -XX:MaxGCPauseMillis=100
>
> -XX:G1ReservePercent=20
>
> -XX:+AlwaysPreTouch
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+UseStringDeduplication
>
> -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
>
> -XX:NewSize=512m
>
> -XX:MaxNewSize=3g
>
> -XX:NewRatio=2
>
> -XX:MaxMetaspaceExpansion=64M

Wow, that are a lot of GC options. I tend to start without any GC
options (except Xmx/Xms) and only add more if there are problems with GC.


>
>
>
> Any clues as to additional logging to further investigate, or JMeter settings to change, would be greatly appreciated.

Take threaddumps when the halt is happening.

Felix

>
> Thanks,
>
> Alistair
>
> ==================================================================
> Alistair Evans
> Chief Software Architect for Wholesale Banking
> 7 Layer Solutions Limited
> Tel: +44 (0)1242 225520
> Email: [hidden email]<mailto:[hidden email]>
> Web: www.7layer.net<http://www.7layer.net/>
> ==================================================================
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: JMeter Halts all Activity for 60 seconds

deicool
Looks like a application bug..probably a deadlock

On Mon, 17 Jun 2019, 22:11 Felix Schumacher, <
[hidden email]> wrote:

>
> Am 17.06.19 um 15:32 schrieb Alistair Evans:
> > Hi,
> >
> > I'm encountering a strange problem when running our JMeter test plan.
> >
> > I effectively get a halt to all JMeter activity for up to 60 seconds (a
> strangely round number) in the middle of the test. JMeter treats this as
> elapsed time when it processes the HTTP responses after the delay, so we
> get a spike in our response times graph (screenshot available here:
> https://imgur.com/ayKTzOU).
>
> Are you sure it is JMeter that is hanging for 60 seconds? Have you had a
> look at the system under test? Maybe use another JMeter instance with
> less load to look, if you get the same 60 seconds period of hangup.
>
> Is this halt reproducible? If so take a few threaddumps at those times
> (both on JMeter side and the server side).
>
> >
> > Test Description: A typical test contains 1000 users, over 2 hours, with
> a 15 minute ramp time on requests. Randomised delays are present between
> all requests.
> >
> > Platform Details from JMeter Log:
> > Version 5.1.1 r1855137
> > java.version=1.8.0_211
> > java.vm.name=Java HotSpot(TM) 64-Bit Server VM
> > os.name=Windows 10
> > os.arch=amd64
> > os.version=10.0
> > file.encoding=Cp1252
> > java.awt.headless=true
> > Max memory     =7654604800
> > Available Processors =4
> > Default Locale=English (EN)
> > JMeter  Locale=English (EN)
> >
> > During this window of zero activity (in these examples between 11:38:44
> and 11:39:44), I have determined the following:
> >
> >
> > -          During this window there are zero HTTP requests being
> processed by the application servers.
> As written above, have a look at the state of the server, too.
> >
> >
> > -          After the delay, JMeter issues any requests whose delay
> timers have passed (no timers are longer than 1 or 2 seconds), causing a
> spike in requests.
> That is expected and a feature for JMeter waiting to be implemented
> (that is handling these kind of overload gracefully and notifying the
> user about such an overload period).
> >
> >
> > -          Processor usage on the load source box where JMeter is
> running drops to almost zero during this period (you can see the perfmon
> capture for the test on the JMeter box here: https://imgur.com/vgVhB1X).
> Not unsual, waiting on locks or the server doesn't need much cpu.
> >
> >
> >
> > -          Disk queue length is also zero, which suggests it isn't a
> case of Windows blocking a process when it opens a file handle.
> >
> >
> > -          I've turned on GC Verbose logging, because my initial
> suspicion was GC STW pauses, but weirdly no GC happens at all during that
> period, the GC logs reports 60 seconds of application time, when normal
> time between GC is 1 second or less; here's an extract of the log:
> >
> >
> > Metaspace       used 53031K, capacity 83842K, committed 101364K,
> reserved 1112064K
> >   class space    used 8006K, capacity 20674K, committed 38132K, reserved
> 1048576K
> > }
> > [Times: user=0.19 sys=0.00, real=0.05 secs]
> > 2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application
> threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249
> seconds
> > 2019-06-17T11:38:44.246+0100: 1550.967: [GC
> concurrent-string-deduplication, 2702.9K->1178.3K(1524.6K), avg 54.2%,
> 0.0038746 secs]
> > 2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019
> seconds
> > {Heap before GC invocations=1196 (full 1):
> > garbage-first heap   total 7475200K, used 3054884K [0x0000000628400000,
> 0x0000000628607210, 0x00000007f0800000)
> >   region size 2048K, 256 young (524288K), 32 survivors (65536K)
> > Metaspace       used 53616K, capacity 85276K, committed 102388K,
> reserved 1114112K
>
> Nice to now that GC -- or rather memory usage -- is not a problem here.
>
> >
> > -          If I turn on some of the debug logging in JMeter, (for the
> network activity), then JMeter does still write to the log during this
> period, but at a drastically reduced rate; for example, typically the
> PoolingHttpClientConnectionManager generates a 'Closing connections idle
> longer than 1 MICROSECONDS' log up to 10 times per second, but during this
> period it writes that message zero times.
> Probably because all connections are still used waiting for the server.
> >
> >
> > -          Changing GC settings does have some effect on the frequency
> of this issue happening. With the default GC settings it happens quite
> often (multiple times per test). I'm currently running the following custom
> GC options, which reduces the frequency of the problem to about 50%
> probability of it happening once during a test:
> >
> >
> >
> > -XX:+HeapDumpOnOutOfMemoryError
> >
> > -Xms7299m
> >
> > -Xmx7299m
> >
> > -XX:MaxMetaspaceSize=256m
> >
> > -XX:+UseG1GC
> >
> > -XX:MaxGCPauseMillis=100
> >
> > -XX:G1ReservePercent=20
> >
> > -XX:+AlwaysPreTouch
> >
> > -XX:+ParallelRefProcEnabled
> >
> > -XX:+UseStringDeduplication
> >
> > -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
> >
> > -XX:NewSize=512m
> >
> > -XX:MaxNewSize=3g
> >
> > -XX:NewRatio=2
> >
> > -XX:MaxMetaspaceExpansion=64M
>
> Wow, that are a lot of GC options. I tend to start without any GC
> options (except Xmx/Xms) and only add more if there are problems with GC.
>
>
> >
> >
> >
> > Any clues as to additional logging to further investigate, or JMeter
> settings to change, would be greatly appreciated.
>
> Take threaddumps when the halt is happening.
>
> Felix
>
> >
> > Thanks,
> >
> > Alistair
> >
> > ==================================================================
> > Alistair Evans
> > Chief Software Architect for Wholesale Banking
> > 7 Layer Solutions Limited
> > Tel: +44 (0)1242 225520
> > Email: [hidden email]<mailto:[hidden email]>
> > Web: www.7layer.net<http://www.7layer.net/>
> > ==================================================================
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>