Diagnosing memory leak issues
When you find the memory usage is very high and increases very fast in a short time period, it might be a memory leak issue, and you can analyze by the following steps.
Please note memory increase does not always mean a memory leak. A memory leak issue usually has these phenomena:
-
Very fast and abnormal memory increase (usually with common or low traffic level)
-
Continuous memory increase without deallocated
-
Used memory are not deallocated even after traffic drops or stopped
The most important thing for troubleshooting a memory leak issue is to locate which module, process or function causes the memory increase.
-
Check history logs to see memory resource status:
Log&Report > Event > Filter > Action > check-resource
failure msg="mem usage raise too high,mem(67)
- Check if there are some memory related print outputs in the console.
-
Check connection amounts to see if memory increase is possibly caused by too many concurrent connections.
/# netstat -nat | awk '{print $6}' | sort | uniq -c | sort -r
319800 ESTABLISHED
330 FIN_WAIT2
251 LISTEN
7 TIME_WAIT
1 established)
1 SYN_SENT
1 Foreign
If there are too many TIME_WAIT or FIN_WAIT2 connections, it may be abnormal because connections are not closed normally.
If memory usage still does not decrease when TIME_WAIT or FIN_WAIT2 are released, it may mean memory leak.
-
Execute “
diagnose debug memory
” several times, then compare the diff of the output to find which part/module/process has the most increase.According to the memory increment speed, you may adjust the interval to execute the command and collect the output.
-
Use
diagnose debug jemalloc-heap & diagnose system jeprof
to trace and analyze memory occupation and cause of memory usage over a period of time.- If the jemallc profile is activated and the memory usage exceeds the configured threshold, the heap file will be generated in directory /var/log/gui_upload.
- You can use jemalloc-heap to show or clear the heap files. At most 10 heap files are kept on the device.
- You can use jeprof to parse the heap file via jeprof tool
-
The jemalloc commands don't give us useful information when the memory doesn't increase.
1) Enable jemalloc profile
FortiWeb# diagnose debug jemalloc-conf proxyd enable
2) if memory increases quickly, execute below command to generate dump files.
E.g., you can wait the memory usage to increase 10% and execute below commands; and it’s better to repeat this commands for several times when memory increases every 10%:
FortiWeb# diagnose debug jemalloc proxyd dump
3) Check the dump heap file generated:
FortiWeb # diagnose debug jemalloc-heap show
jeprof.out.28279.1641342474.heap
jeprof.out.4973.1641276249.heap
4) After getting a few heap file, execute below command to parse the heap file
FortiWeb # diagnose system jeprof proxyd
Using local file /bin/proxyd
Using local file /var/log/gui_upload/jeprof.out.28279.1641342474.heap
Total: 124422365 B
34403589 27.7% 27.7% 34403589 27.7% ssl3_setup_write_buffer
34262011 27.5% 55.2% 34262011 27.5% ssl3_setup_read_buffer
18062121 14.5% 69.7% 18062121 14.5% CRYPTO_zalloc
17011023 13.7% 83.4% 17011023 13.7% _HTTP_init
9905760 8.0% 91.3% 9905760 8.0% BUF_MEM_grow
3195135 2.6% 93.9% 3195135 2.6% buffer_new
1583640 1.3% 95.2% 18857320 15.2% HTTP_substream_process_ctx_create
…
Using local file /bin/proxyd
Using local file /var/log/gui_upload/jeprof.out.4973.1641276249.heap
Total: 576387295 B
175840569 30.5% 30.5% 175840569 30.5% ssl3_setup_write_buffer
175415833 30.4% 60.9% 175415833 30.4% ssl3_setup_read_buffer
81823328 14.2% 75.1% 81823328 14.2% CRYPTO_zalloc
72087699 12.5% 87.6% 72612307 12.6% _HTTP_init
8578052 1.5% 89.1% 84473564 14.7% HTTP_substream_process_ctx_create
7654262 1.3% 90.5% 7654262 1.3% asn1_enc_save
7311586 1.3% 91.7% 7311586 1.3% HTTP_get_modify_value_by_name
6855757 1.2% 92.9% 6855757 1.2% pt_stream_create_svrinfo
5851046 1.0% 93.9% 5851046 1.0% _hlp_parse_cookie
5136808 0.9% 94.8% 5136808 0.9% HTTP_process_ctx_create
5) Use graph tool to analyze the function call relationship from .heap files
This tool is for internal R&D investigation only. Just for reference.
- Generate a .dot file on FortiWeb backend shell:
jeprof --dot /bin/proxyd jeprof.out.4973.1641276249.heap > 1641276249.dot
- Copy 1601044510.dot to ubuntu;
-
Install graphviz on Ubuntu:
apt install graphviz
-
Generate a png picture:
dot -Tpng 1641276249.dot -o 1641276249.png
A png image will be generated as below, indicating the top memory usage functions, and function call relationship. Taking the case below for example, one can check if HTTPS traffic load increased or related configuration is changed.
6) You can also download the jeprof.out files and provide them to support team for further investigation:
/var/log/gui_upload# ls jeprof.out* -l
-rw-r--r-- 1 root 0 109251 Sep 27 18:30 jeprof.out.11164.1632789019.heap
-rw-r--r-- 1 root 0 111975 Dec 22 12:22 jeprof.out.3777.1640200954.heap
Note: In jeprof.out.3777.1640200954.heap:
3777 is the PID of proxyd
1640200954 is the UNIX timestamp; one can use online tools to convert it to a human-readable date so as to just pay attention to recent dump files. This is useful to confirm the recent & current coredump files if there are many files.
E.g.:
-
As stated in point 2, after 6.4.0 GA release, a regular monitoring file is generated as /var/log/gui_upload/debug_memory.txt. One can set a memory boundary for it: if the memory usage reaches the boundary and proxyd or ml_daemon is the top 10 high memory usage, it will enable their jemalloc debug function automatically.
FortiWeb # show full system global
config system global
set debug-memory-boundary 70 #memory usage percentage, 1%-100%
End