MongoDB自带的监控工具mongostat与mongotop

这两个命令来监控MongoDB的运行情况。这两个命令对于我们处理MongoDB数据库变慢等等问题非常有用，能详细的统计MongoDB当前的状态信息。除此之外，还可以用db.serverStatus()、db.stats()、开启profile功能通过查看日志进行监控分析。

一、mongotop

mongotop用来跟踪MongoDB的实例，提供每个集合的统计数据。默认情况下，mongotop每一秒刷新一次。

./bin/mongotop --help

View live MongoDB collection statistics.

Options:

--help 显示帮助信息

-v [ --verbose ] be more verbose (include multiple times

for more verbosity e.g. -vvvvv)

--quiet silence all non error diagnostic

messages

--version 版本号

-h [ --host ] arg 主机地址( /s1,s2 for sets)

--port arg 服务端口,也可以使用 --host hostname:port

--ipv6 开启IPV6，默认关闭

-u [ --username ] arg 用户名

-p [ --password ] arg 密码

--authenticationDatabase arg user source (defaults to dbname)

--authenticationMechanism arg (=MONGODB-CR)

authentication mechanism

--gssapiServiceName arg (=mongodb) Service name to use when authenticating

using GSSAPI/Kerberos

--gssapiHostName arg Remote host name to use for purpose of

GSSAPI/Kerberos authentication

--locks 查看数据库锁的情况

如:

ns total read write

User.User.user 0ms 0ms 0ms

User.system.indexes 0ms 0ms 0ms

......

输出字段说明：

ns：数据库命名空间，后者结合了数据库名称和集合。

db：数据库的名称。名为 . 的数据库针对全局锁定，而非特定数据库。

total：mongod在这个命令空间上花费的总时间。

read：在这个命令空间上mongod执行读操作花费的时间。

write：在这个命名空间上mongod进行写操作花费的时间。

二、mongostat

它每秒钟刷新一次状态值，提供良好的可读性，通过这些参数可以观察到一个整体的性能情况。

./bin/mongostat --help

View live MongoDB performance statistics.

usage: mongostat [options] [sleep time]

sleep time: time to wait (in seconds) between calls

Options:

--help produce help message

-v [ --verbose ] be more verbose (include multiple times

for more verbosity e.g. -vvvvv)

--quiet silence all non error diagnostic

messages

--version print the program's version and exit

-h [ --host ] arg mongo host to connect to (

name>/s1,s2 for sets)

--port arg server port. Can also use --host

hostname:port

--ipv6 enable IPv6 support (disabled by

default)

-u [ --username ] arg username

-p [ --password ] arg password

--authenticationDatabase arg user source (defaults to dbname)

--authenticationMechanism arg (=MONGODB-CR)

authentication mechanism

--gssapiServiceName arg (=mongodb) Service name to use when authenticating

using GSSAPI/Kerberos

--gssapiHostName arg Remote host name to use for purpose of

GSSAPI/Kerberos authentication

--noheaders don't output column names

-n [ --rowcount ] arg (=0) number of stats lines to print (0 for

indefinite)

--http use http instead of raw db connection

--discover discover nodes and display stats for

all

--all all optional fields

如：

./mongostat -h 80.81.2.3 -p 27017

insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time

*0 *0 *0 *0 0 2|0 0 448m 3.22g 74m 0 test:0.0% 0 0|0 0|0 120b 4k 26 shard_a PRI 17:43:51

*0 *0 *0 *0 0 3|0 0 448m 3.22g 74m 0 local:0.0% 0 0|0 0|0 353b 4k 26 shard_a PRI 17:43:52

*0 *0 *0 *0 1 2|0 0 448m 3.22g 74m 0 test:0.0% 0 0|0 0|0 167b 4k 26 shard_a PRI 17:43:53

举例：

20秒数据，每1秒钟输出

mongostat -h 80.81.2.3 -p 27017 --rowcount 20 1

300秒数据，每5秒输出

mongostat -h 80.81.2.3 -p 27017 -n 300 5

以json格式输出

mongostat -h 80.81.2.4 -p 27017 -n 60 1 --json

搜集复制集aCloud中所有节点的60s内运行状态，间隔1s刷新，保存到文件mongostat_aCloud.log中。

mongostat -h aCloud/80.81.2.4,80.81.2.5,80.81.2.6 -p 27017 -n 60 1 >> mongostat_aCloud.log

字段说明：

insert: 一秒内的插入数

query : 一秒内的查询数

update: 一秒内的更新数

delete: 一秒内的删除数

10条简单的查询可能比一条复杂的查询速度还快, 所以数值的大小，意义并不大。

但至少可以知道，现在是否在处理查询，是否在插入。

如果是slave，数值前往往有一个*, 代表是replicate操作

getmore: 查询时游标(cursor)的getmore操作

command: 一秒内执行的命令数

比如批量插入，只认为是一条命令。意义不大。

如果是slave，会显示两个值, local|replicated，通过这两个数值的比较，或许可以看出点问题。

flushes: 一秒内flush的次数

一般都是0，或者1，通过计算两个1之间的间隔时间，可以大致了解多长时间flush一次。

flush开销是很大的，如果频繁的flush，可能就要找找原因了。

mapped:

vsize:

res:

这个和你用top看到的一样，mapped, vsize一般不会有大的变动， res会慢慢的上升，如果res经常突然下降，去查查是否有别的程序狂吃内存。

faults:

别被这个名字吓着，大压力下这个数值往往不为0。如果经常不为0，那就该加内存了。

locked:

MongoDB就一把读写锁，这里指的是写锁所住的时间百分比。这个数值过大(经常超过10%)，那就是出状况了。

idx miss:

非常重要的参数, 正常情况下，所有的查询都应该通过索引，也就是idx miss为0。如果这里数值较大，是不是缺少索引。

qr|qw: queue lengths for clients waiting (read|write)

ar|aw: active clients (read|write)

如果这两个数值很大，那么就是DB被堵住了，DB的处理速度不及请求速度。

看看是否有开销很大的慢查询。如果查询一切正常，确实是负载很大，就需要加机器了。

netIn: network traffic in - bits

netOut: network traffic out - bits

网络带宽压力，一般MongoDB，网络不会成为瓶颈

conn: number of open connections

MongoDB为每一个连接创建一个线程，线程的创建和释放也是有开销的。尽量不要让这个数值很大。

repl: 服务器当前状态

M - master

SEC - secondary

REC - recovering

UNK - unknown

SLV - slave

time: 当前时间

将监控信息输出到文件

E:\mongodb-win32-x86_64-2.2.1\bin\mongostat -n 2 > E:\test.txt

mongostat -h 80.81.2.3 -p 27017 --rowcount 20 1 > /data/log.txt

MongoDB自带的监控工具mongostat与mongotop！

MongoDB自带的监控工具mongostat与mongotop