运维实战

2015-12-25

实战, 运维

websocket 服务运维

0. 网络代理工具

网络代理工具搭建详细图文教程

1. 服务器配置

Amazon Linux AMI（Linux version 4.9.76-3.78.amzn1.x86_64）(Red Hat 7.2.1-2)
Mem 16G
CPU 物理 1；逻辑 4
硬盘 300G

2. 系统负载查看

查看端口实际连接数量

1
2
3

netstat -nat|grep ESTABLISHED|grep -i "38080"|wc -l
# 查看各个端口的占用情况
netstat -tlnp

top 简单查看

查看程序占用实际内存

#free -m

             total       used       free     shared    buffers     cached
Mem:         15586      12069       3517          0        381       5295
-/+ buffers/cache:       6392       9193                                    // 实际程序占用的内存，used为已用内存；free为剩余内存 
Swap:            0          0          0

iftop 查看实时网络流量

sar查看历史负载

#yum unstall sysstat // 安装sar

# sar -q  // 查看负载

Linux 4.9.76-3.78.amzn1.x86_64       04/24/2018      _x86_64_        (4 CPU)

08:10:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
08:20:01 AM         1       181      1.11      1.43      1.42
08:30:01 AM         1       181      0.76      1.11      1.28
08:40:01 AM         1       181      1.34      1.28      1.28
08:50:01 AM         0       180      1.21      1.18      1.20
Average:            1       181      1.10      1.25      1.29

参数说明：

runq-sz：运行队列的长度（等待运行的进程数）
plist-sz：进程列表中进程（processes）和线程（threads）的数量
ldavg-1：最后1分钟的系统平均负载（System load average）
ldavg-5：过去5分钟的系统平均负载
ldavg-15：过去15分钟的系统平均负载

sar –r 查看内存使用情况

Linux 4.9.76-3.78.amzn1.x86_64       04/24/2018      _x86_64_        (4 CPU)

08:10:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
08:20:01 AM   3608000  12352420     77.39    390508   5420888   6499432     40.72
08:30:01 AM   3594192  12366228     77.48    390512   5424568   6517400     40.83
08:40:01 AM   3598408  12362012     77.45    390516   5435772   6489352     40.66
08:50:01 AM   3622276  12338144     77.30    390520   5392172   6528288     40.90
09:00:01 AM   3600552  12359868     77.44    390524   5401404   6525636     40.89
Average:      3604686  12355734     77.41    390516   5414961   6512022     40.80

参数说明：

kbmemfree：空闲物理内存量
kbmemused：使用中的物理内存量
%memused：物理内存量使用率
kbbuffers：内核中作为缓冲区使用的物理内存容量
kbcacheed：内核中作为缓存使用的物理内存容量
kbcommit：保证当前系统所需要的内存,即为了确保不溢出而需要的内存(RAM+swap)
%commit：这个值是kbcommit与内存总量(包括swap)的一个百分比

sar –b 查看I/O和传送速率的统计信息

09:09:28 AM       tps      rtps      wtps   bread/s   bwrtn/s
09:09:29 AM      3.06      0.00      3.06      0.00     40.82
09:09:30 AM      4.00      0.00      4.00      0.00     16.00
09:09:31 AM     36.63      0.00     36.63      0.00  14146.53
09:09:32 AM      0.00      0.00      0.00      0.00      0.00
09:09:33 AM     12.12      0.00     12.12      0.00    137.37
Average:        11.24      0.00     11.24      0.00   2907.63

参数说明：

tps：每秒钟物理设备的 I/O 传输总量
rtps：每秒钟从物理设备读入的数据总量
wtps：每秒钟向物理设备写入的数据总量
bread/s：每秒钟从物理设备读入的数据量，单位为块/s
bwrtn/s：每秒钟向物理设备写入的数据量，单位为块/s

sar –u 查看CPU使用率

08:10:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:20:01 AM     all     31.22      0.00      0.73      0.04      0.00     68.01
08:30:01 AM     all     30.46      0.00      0.69      0.04      0.00     68.81
08:40:01 AM     all     29.37      0.00      0.70      0.04      0.00     69.89
08:50:01 AM     all     30.28      0.00      0.67      0.04      0.00     69.01
09:00:01 AM     all     31.74      0.00      0.69      0.04      0.00     67.53
09:10:01 AM     all     33.11      0.00      0.77      0.04      0.00     66.08
Average:        all     31.03      0.00      0.71      0.04      0.00     68.22

参数说明：

%user：用户模式下消耗的CPU时间的比例
%nice：通过nice改变了进程调度优先级的进程，在用户模式下消耗的CPU时间的比例
%system：系统模式下消耗的CPU时间的比例
%iowait：CPU等待磁盘I/O而导致空闲状态消耗时间的比例
%steal：利用Xen等操作系统虚拟化技术时，等待其他虚拟CPU计算占用的时间比例
%idle：CPU没有等待磁盘I/O等的空闲状态消耗的时间比例

网卡流量查看命令

1 2	sar -n DEV #查看当天从零点到当前时间的网卡流量信息 sar -n DEV -f /var/log/sa/saxx #查看xx日的网卡流量历史

3. 活动相关

活动准备

预算：100w
说明：fb登录————参与活动获取积分————积分兑换现金（通过淘宝账号发放）
排行榜：前三名特殊奖励
物料：apk，h5分享页面，宣传图，广告导流量，渠道放量

客户端数据

次留： 友盟（原：25% 现：39%） GA（原：45% 现：63%）
15日：友盟 44%，GA %

新增： 增加 30%  4天50w
参与活动： 4天16122
参与激活： 4天1633
打开分享页面： 200/天

活动期间
15日：拥有积分总人数19847人，总分数2052420分，人均分数约103分，最高分17220
16日：拥有积分总人数25158人，总分数2757580分，人均分数约109分，最高分23140
17日：拥有积分总人数35684人，总分数3967600分，人均分数约111分，最高分25060
18日：拥有积分总人数42321人，总分数5005240分，人均分数约118分，最高分27480，邀请人数2812，总注册用户数2763464
19日：拥有积分总人数46202人，总分数5804820分，人均分数约125分，最高分27600，邀请人数2996，总注册用户数2829817
20日：拥有积分总人数50757人，总分数6613820分，人均分数约130分，最高分27620，邀请人数3249，总注册用户数2901269
21日：拥有积分总人数56019人，总分数7583420分，人均分数约133分，最高分27660，邀请人数3249，总注册用户数2901269
22日：拥有积分总人数59650人，总分数8164780分，人均分数约136分，最高分27660，邀请人数3249，总注册用户数3078666
23日：拥有积分总人数64164人，总分数8972620分，人均分数约139分，最高分27680，邀请人数3949，总注册用户数3159995
24日：拥有积分总人数69400人，总分数9728340分，人均分数约140分，最高分28780，邀请人数4223，总注册用户数3230847
25日：拥有积分总人数73943人，总分数10421840分，人均分数约140分，最高分32220，邀请人数4492，总注册用户数3308397

6月：
07日：总注册用户数4563917

活动结束
28日：拥有积分总人数83639，总注册用户数3536896

稳定期内数据

日活 60w
转化率 20%

07月23日：总注册用户数8962249

服务器数据

网卡流量：峰值 3.185M/s
CPU: 峰值 16.23%
负载：峰值 1.35

1w同时在线
网卡流量：峰值 7.05M/s
CPU: 峰值 44.51%
负载：峰值 4.05

* 活动分享h5页面的访问平均pv 200左右

腾讯渠道要求

全球唯一S级游戏：《纪念碑谷2》
S级：3000万用户/月
A级：百万用户/月
B+级：最低代理级别

问题汇总

部署新功能导致登录卡死，未报任何错误

原因：

用户数据在300万以上，进行数据库根据fb字段查询用户时间过长，导致一直等待查询结果，感觉是程序卡死

解决方案：

fb字段建立索引

1	db.players.ensureIndex({fb: 1})

GP 有很多上线限制，否则会强制下线产品，比如：

==不能诱导评分==

收入

06-06 ad： 551，7day 3636，30day 11296，all 12493 ； app 0，7day 24，30day 147，all 157

成本

每日投放： 2000-3000$

总结

==C1000K, 也就是百万连接的问题==

Linux 系统需要修改内核参数和系统配置, 才能支持 C1000K. C1000K 的应用要求服务器至少需要 2GB 内存, 如果应用本身还需要内存, 这个要求应该是至少 10GB 内存. 同时, 网卡应该至少是万兆网卡.

假设百万连接中有 20% 是活跃的, 每个连接每秒传输 1KB 的数据, 那么需要的网络带宽是 0.2M x 1KB/s x 8 = 1.6Gbps, 要求服务器至少是万兆网卡(10Gbps).

测试 10 万个连接, 这些连接是空闲的, 什么数据也不发送也不接收. 这时, 进程只占用了不到 1MB 的内存. 但是, 通过程序退出前后的 free 命令对比, 发现操作系统用了 200M(大致)内存来维护这 10 万个连接! 如果是百万连接的话, 操作系统本身就要占用 2GB 的内存! 也即 2KB 每连接.

目标

对标

L 25% -> 25% (达到目标) 双周 11% -> 14%

W 33% -> 40%

B 32% -> 41%

休闲游戏

次留 40% 双周留 14% （及格，渠道才会推）

转化率（点击广告，并真正下载安装的比例） 7% 单价 1.1 每日投放1万以内才不会亏本

TopA 玩法 + TopB 美术

nginx

查物理CPU个数 
grep "physical id" /proc/cpuinfo|sort -u|wc -l
查逻辑CPU个数
cat /proc/cpuinfo |grep "processor"|sort -u|wc -l

配置

nginx

user nginx;
worker_processes 8;
error_log /data/log/nginx/error.log;
pid /var/run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    use   epoll;
    worker_connections 20480;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  /data/log/nginx/access.log  main;

    sendfile            on;
    #tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   3000;
    types_hash_max_size 20480;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    #include /etc/nginx/conf.d/*.conf;

    #index   index.html index.htm;

    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
    }

    upstream wsbackend {
        hash $remote_addr consistent; # 可以根据客户端ip映射
        server 127.0.0.1:3801 weight=1;
	    server 127.0.0.1:3802 weight=1;
	    server 127.0.0.1:3803 weight=1;
        server 127.0.0.1:3804 weight=1;
        server 127.0.0.1:3805 weight=1;
        server 127.0.0.1:3806 weight=1;
    }

    server {
        listen       8080;
        server_name  127.0.0.1;

        location / {
            proxy_pass http://wsbackend;
            proxy_set_header Host $host:$server_port;
            proxy_http_version 1.1;
            proxy_connect_timeout 60s; #配置点1
            proxy_read_timeout 3000s; #配置点2，如果没效，可以考虑这个时间配置长一点
            proxy_send_timeout 120s; #配置点3
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }
    }
}

服务启动

1	pm2 start app.js --node-args="--nouse-idle-notification" --name socket3801 -e logs/socket3801.err -o logs/socket3801.out -- 3801

数据

之前：1个服务单个CPU 100%

服务器最大连接数 12300

现在：3个服务 CPU 66%

服务器最大连接数 24696（14：00） 32357（16：30） 33980（17：00）

netstat -nat|grep ESTABLISHED|grep -i "38080"|wc -l
服务最大连接数
07-27 8389（14：00） 15619（24：00）

OOM 问题——记录一次MongoDB挂掉的问题

问题初现

查看当前服务器连接数为3000多，正常

1 2	$ netstat -nat\|grep ESTABLISHED\|wc -l 3793

查看负载发现问题，居然全部服务占用cpu都在100%左右；正常3万多连接时才会占用100%左右CPU；而且发现mongod居然不见了

$ top
top - 00:51:40 up 64 days, 20:25,  1 user,  load average: 2.60, 2.65, 2.80
Tasks: 148 total,   4 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s): 35.1%us,  0.2%sy,  0.0%ni, 64.7%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  32130856k total,  9380080k used, 22750776k free,    66948k buffers
Swap:        0k total,        0k used,        0k free,   986712k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2248 root      20   0 3317m 2.3g  11m R 121.8  7.4   1056:43 node /data/run/
 2088 root      20   0 3193m 2.1g  11m R 85.6  7.0 945:55.43 node /data/run/
 2206 root      20   0 3634m 2.6g  12m R 69.6  8.5   1106:46 node /data/run/
 6599 root      20   0  934m  41m   9m S  1.0  0.1   3:03.96 node /data/run/

查看服务日志

发现mongodb都连不上了：error:MongoError: no connection available for operation and number of stored operation > 0

# pm2 logs 12
[TAILING] Tailing last 15 lines for [12] process (change the value with --lines option)
[INFO] error - 1000 login userId:undefined error:MongoError: no connection available for operation and number of stored operation > 0
[INFO] error - 8000 getSwitchs userId:undefined dataObj:{"cmd":"8000","data":{"value":"GooglePlay"}} error:MongoError: no connection available for operation and number of stored operation > 0
[INFO] error - 1000 login userId:undefined error:MongoError: no connection available for operation and number of stored operation > 0
[INFO] error - 8000 getSwitchs userId:undefined dataObj:{"cmd":"8000","data":{}} error:MongoError: no connection available for operation and number of stored operation > 0

马上重启mongodb，恢复服务
查看mongodb的日志

发现挂掉前的日志信息：[ftdc] serverStatus was very slow: { after basic: 440, after asserts: 560, after connections: 870, after extra_info: 13400, after globalLock: 14610, after locks: 14800, after network: 14810, after opcounters: 14810, after opcountersRepl: 14810, after storageEngine: 14820, after tcmalloc: 15000, after wiredTiger: 26510, at end: 26590 }

# tail -n500 -f /var/log/mongodb/mongod.log
2018-07-24T15:19:25.880+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 40, after asserts: 150, after connections: 180, after extra_info: 220, after globalLock: 230, after locks: 240, after network: 330, after opcounters: 450, after opcountersRepl: 510, after storageEngine: 570, after tcmalloc: 1900, after wiredTiger: 2260, at end: 2550 }
2018-07-24T15:19:25.901+0000 I COMMAND  [conn31541] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 383ms
2018-07-24T15:19:32.105+0000 I COMMAND  [conn31548] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5544ms
2018-07-24T15:19:32.105+0000 I COMMAND  [conn31553] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5544ms
2018-07-24T15:19:32.215+0000 I COMMAND  [conn31596] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 128ms
2018-07-24T15:19:48.327+0000 I COMMAND  [PeriodicTaskRunner] task: DBConnectionPool-cleaner took: 632ms
2018-07-24T15:20:28.911+0000 I COMMAND  [conn31542] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5963ms
2018-07-24T15:20:28.911+0000 I COMMAND  [conn31549] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5963ms
2018-07-24T15:20:28.911+0000 I COMMAND  [conn31559] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5963ms
2018-07-24T15:20:29.167+0000 I COMMAND  [conn31554] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5963ms
2018-07-24T15:20:29.266+0000 I COMMAND  [conn31597] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 locks:{} protocol:op_query 5963ms
2018-07-24T15:20:30.159+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 440, after asserts: 560, after connections: 870, after extra_info: 13400, after globalLock: 14610, after locks: 14800, after network: 14810, after opcounters: 14810, after opcountersRepl: 14810, after storageEngine: 14820, after tcmalloc: 15000, after wiredTiger: 26510, at end: 26590 }

mongodb已经积攒了很多任务没有完成了，内存严重不足了

# mongostat
insert query update delete getmore command % dirty % used flushes vsize   res qr|qw ar|aw netIn netOut conn                 time
     2   332     32     *0       0    49|0     0.8   30.0       0 6.11G 5.23G   0|0   0|0 70.3k  1.62m  324 2018-07-25T08:37:22Z
     4   364     40      2       0    48|0     0.8   30.0       0 6.11G 5.23G   0|0   0|0  114k  1.92m  324 2018-07-25T08:37:23Z
     6   365     35      4       0    52|0     0.8   30.0       0 6.12G 5.23G   0|0   0|0 79.9k  1.68m  324 2018-07-25T08:37:24Z

查看系统日志，看是否有oom问题

1	tail -n1000 -f /var/log/messages

日志内容发现 sendmail invoked oom-killer:

mongodb占用内存最大：

1 2	[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 2390] 498 2390 5455340 4594693 10546 23 0 0 mongod

所以在oom时候被系统kill了

1	Out of memory: Kill process 2390 (mongod) score 573 or sacrifice child

日志如下：

Jul 24 15:16:06 kernel: [5568525.691883] TCP: request_sock_TCP: Possible SYN flooding on port 3803. Sending cookies.  Check SNMP counters.
Jul 24 15:17:25 dhclient[2024]: XMT: Solicit on eth0, interval 111420ms.
Jul 24 15:19:15 dhclient[2024]: XMT: Solicit on eth0, interval 109420ms.
Jul 24 15:20:38 kernel: [5568798.064590] sendmail invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Jul 24 15:20:38 kernel: [5568798.070010] sendmail cpuset=/ mems_allowed=0
Jul 24 15:20:38 kernel: [5568798.072544] CPU: 0 PID: 2361 Comm: sendmail Tainted: G            E   4.9.76-3.78.amzn1.x86_64 #1
Jul 24 15:20:38 kernel: [5568798.076974] Hardware name: Amazon EC2 m5.2xlarge/, BIOS 1.0 10/16/2017
Jul 24 15:20:38 kernel: [5568798.079800]  ffffc90005f0b9f0 ffffffff8130de42 ffffc90005f0bbb8 ffff8807e5e68000
Jul 24 15:20:38 kernel: [5568798.083899]  ffffc90005f0ba68 ffffffff8120648c 0000000000000000 0000000000000000
Jul 24 15:20:38 kernel: [5568798.087983]  ffffc90005f0ba78 ffffffff8119ea90 0000000c05f0baf4 ffffc90005f0ba58
Jul 24 15:20:38 kernel: [5568798.092071] Call Trace:
Jul 24 15:20:38 kernel: [5568798.093745]  [<ffffffff8130de42>] dump_stack+0x63/0x81
Jul 24 15:20:38 kernel: [5568798.143902]  [<ffffffff8120648c>] dump_header+0x7b/0x1fd
Jul 24 15:20:38 kernel: [5568798.146412]  [<ffffffff8119ea90>] ? do_try_to_free_pages+0x2a0/0x300
Jul 24 15:20:38 kernel: [5568798.149157]  [<ffffffff8118936b>] oom_kill_process+0x20b/0x3e0
Jul 24 15:20:38 kernel: [5568798.151760]  [<ffffffff811899a7>] out_of_memory+0x297/0x4c0
Jul 24 15:20:38 kernel: [5568798.154271]  [<ffffffff8118ec80>] __alloc_pages_slowpath+0x9f0/0xc10
Jul 24 15:20:38 kernel: [5568798.157061]  [<ffffffff8118effc>] __alloc_pages_nodemask+0x15c/0x230
Jul 24 15:20:38 kernel: [5568798.160111]  [<ffffffff811de823>] alloc_pages_current+0x93/0x150
Jul 24 15:20:38 kernel: [5568798.162835]  [<ffffffff81184879>] __page_cache_alloc+0xc9/0xe0
Jul 24 15:20:38 kernel: [5568798.165445]  [<ffffffff81187a26>] filemap_fault+0x356/0x4d0
Jul 24 15:20:38 kernel: [5568798.168013]  [<ffffffffa00abfd6>] ext4_filemap_fault+0x36/0x50 [ext4]
Jul 24 15:20:38 kernel: [5568798.170906]  [<ffffffff811b79a4>] __do_fault+0x74/0xe0
Jul 24 15:20:38 kernel: [5568798.173685]  [<ffffffff811bbf28>] handle_mm_fault+0xcc8/0x12b0
Jul 24 15:20:38 kernel: [5568798.176405]  [<ffffffff812220a1>] ? __dentry_kill+0x121/0x160
Jul 24 15:20:38 kernel: [5568798.179185]  [<ffffffff81222111>] ? dput.part.24+0x31/0x240
Jul 24 15:20:38 kernel: [5568798.181807]  [<ffffffff81064702>] __do_page_fault+0x232/0x4b0
Jul 24 15:20:38 kernel: [5568798.184408]  [<ffffffff810649e7>] trace_do_page_fault+0x37/0xd0
Jul 24 15:20:38 kernel: [5568798.187059]  [<ffffffff8105e044>] do_async_page_fault+0x54/0x70
Jul 24 15:20:38 kernel: [5568798.189701]  [<ffffffff81559278>] async_page_fault+0x28/0x30
Jul 24 15:20:38 kernel: [5568798.192562] Mem-Info:
Jul 24 15:20:38 kernel: [5568798.194329] active_anon:7799274 inactive_anon:9 isolated_anon:0
Jul 24 15:20:38 kernel: [5568798.194329]  active_file:1085 inactive_file:1036 isolated_file:32
Jul 24 15:20:38 kernel: [5568798.194329]  unevictable:0 dirty:12 writeback:0 unstable:0
Jul 24 15:20:38 kernel: [5568798.194329]  slab_reclaimable:13997 slab_unreclaimable:54242
Jul 24 15:20:38 kernel: [5568798.194329]  mapped:1343 shmem:13 pagetables:37659 bounce:0
Jul 24 15:20:38 kernel: [5568798.194329]  free:49326 free_pcp:482 free_cma:0
Jul 24 15:20:38 kernel: [5568798.209612] Node 0 active_anon:31197096kB inactive_anon:36kB active_file:2216kB inactive_file:2124kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:1316kB dirty:48kB writeback:0kB shmem:52kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:6789 all_unreclaimable? no
Jul 24 15:20:38 kernel: [5568798.222372] Node 0 DMA free:15908kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jul 24 15:20:38 kernel: [5568798.235491] lowmem_reserve[]: 0 2962 31344 31344
Jul 24 15:20:38 kernel: [5568798.238006] Node 0 DMA32 free:121688kB min:6384kB low:9416kB high:12448kB active_anon:2748340kB inactive_anon:0kB active_file:108kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129316kB managed:3051456kB mlocked:0kB slab_reclaimable:6032kB slab_unreclaimable:108960kB kernel_stack:660kB pagetables:8008kB bounce:0kB free_pcp:1596kB local_pcp:0kB free_cma:0kB
Jul 24 15:20:38 kernel: [5568798.252147] lowmem_reserve[]: 0 0 28382 28382
Jul 24 15:20:38 kernel: [5568798.254561] Node 0 Normal free:61136kB min:61164kB low:90224kB high:119284kB active_anon:28448756kB inactive_anon:36kB active_file:2644kB inactive_file:2808kB unevictable:0kB writepending:56kB present:29577216kB managed:29063492kB mlocked:0kB slab_reclaimable:49956kB slab_unreclaimable:108008kB kernel_stack:31452kB pagetables:142628kB bounce:0kB free_pcp:2108kB local_pcp:0kB free_cma:0kB
Jul 24 15:20:38 kernel: [5568798.269009] lowmem_reserve[]: 0 0 0 0
Jul 24 15:20:38 kernel: [5568798.271285] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Jul 24 15:20:38 kernel: [5568798.277864] Node 0 DMA32: 968*4kB (UME) 11618*8kB (UME) 1502*16kB (UME) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 120848kB
Jul 24 15:20:38 kernel: [5568798.284077] Node 0 Normal: 3479*4kB (UMEH) 5890*8kB (UMEH) 7*16kB (UH) 1*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61244kB
Jul 24 15:20:38 kernel: [5568798.290682] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jul 24 15:20:38 kernel: [5568798.295123] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 24 15:20:38 kernel: [5568798.299327] 2515 total pagecache pages
Jul 24 15:20:38 kernel: [5568798.301376] 0 pages in swap cache
Jul 24 15:20:38 kernel: [5568798.303319] Swap cache stats: add 0, delete 0, find 0/0
Jul 24 15:20:38 kernel: [5568798.305764] Free swap  = 0kB
Jul 24 15:20:38 kernel: [5568798.307534] Total swap = 0kB
Jul 24 15:20:38 kernel: [5568798.309399] 8180631 pages RAM
Jul 24 15:20:38 kernel: [5568798.311239] 0 pages HighMem/MovableOnly
Jul 24 15:20:38 kernel: [5568798.313462] 147917 pages reserved
Jul 24 15:20:38 kernel: [5568798.315484] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jul 24 15:20:38 kernel: [5568798.319795] [ 1364]     0  1364     2769      119      11       3        0         -1000 udevd
Jul 24 15:20:38 kernel: [5568798.324201] [ 1683]     0  1683    27274       57      21       3        0             0 lvmetad
Jul 24 15:20:38 kernel: [5568798.328678] [ 1692]     0  1692     6788       48      16       3        0             0 lvmpolld
Jul 24 15:20:38 kernel: [5568798.333184] [ 1894]     0  1894     2342      124       9       3        0             0 dhclient
Jul 24 15:20:38 kernel: [5568798.337678] [ 2024]     0  2024     2342      120       9       3        0             0 dhclient
Jul 24 15:20:38 kernel: [5568798.342152] [ 2066]     0  2066   228669     1354      51       6        0             0 amazon-ssm-agen
Jul 24 15:20:38 kernel: [5568798.346978] [ 2074]     0  2074    13240      106      26       3        0         -1000 auditd
Jul 24 15:20:38 kernel: [5568798.351529] [ 2110]     0  2110    62040     1226      25       4        0             0 rsyslogd
Jul 24 15:20:38 kernel: [5568798.356039] [ 2124]     0  2124    23214       56      15       3        0             0 irqbalance
Jul 24 15:20:38 kernel: [5568798.360769] [ 2134]     0  2134     1619       25       9       3        0             0 rngd
Jul 24 15:20:38 kernel: [5568798.365175] [ 2153]    32  2153     8830       98      21       3        0             0 rpcbind
Jul 24 15:20:38 kernel: [5568798.369722] [ 2174]    29  2174     9972      202      24       3        0             0 rpc.statd
Jul 24 15:20:38 kernel: [5568798.374257] [ 2205]    81  2205     5450       59      15       3        0             0 dbus-daemon
Jul 24 15:20:38 kernel: [5568798.378857] [ 2240]     0  2240     1088       36       8       3        0             0 acpid
Jul 24 15:20:38 kernel: [5568798.383346] [ 2331]     0  2331    20123      207      41       3        0         -1000 sshd
Jul 24 15:20:38 kernel: [5568798.387703] [ 2341]    38  2341     7443      143      19       3        0             0 ntpd
Jul 24 15:20:38 kernel: [5568798.392048] [ 2361]     0  2361    22383      434      45       3        0             0 sendmail
Jul 24 15:20:38 kernel: [5568798.396449] [ 2370]    51  2370    20247      373      42       3        0             0 sendmail
Jul 24 15:20:38 kernel: [5568798.401024] [ 2390]   498  2390  5455340  4594693   10546      23        0             0 mongod
Jul 24 15:20:38 kernel: [5568798.405523] [ 2425]     0  2425    30402      150      16       3        0             0 crond
Jul 24 15:20:38 kernel: [5568798.409844] [ 2439]     0  2439     4786       42      14       3        0             0 atd
Jul 24 15:20:38 kernel: [5568798.414247] [ 2463]     0  2463     1616       30       8       3        0             0 agetty
Jul 24 15:20:39 kernel: [5568798.418629] [ 2464]     0  2464     1079       25       8       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.423145] [ 2468]     0  2468     1079       24       8       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.427649] [ 2471]     0  2471     1079       25       8       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.432105] [ 2473]     0  2473     1079       24       8       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.436530] [ 2475]     0  2475     1079       24       7       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.440954] [ 2477]     0  2477     1079       23       8       3        0             0 mingetty
Jul 24 15:20:39 kernel: [5568798.445431] [ 2479]     0  2479     2720       93      10       3        0         -1000 udevd
Jul 24 15:20:39 kernel: [5568798.449903] [ 2480]     0  2480     2735      108      10       3        0         -1000 udevd
Jul 24 15:20:39 kernel: [5568798.454316] [ 2603]     0  2603   236602    12220     120      78        0             0 PM2 v2.10.1: Go
Jul 24 15:20:39 kernel: [5568798.458872] [ 2342]     0  2342   348637   260679     626       4        0             0 redis-server
Jul 24 15:20:39 kernel: [5568798.463472] [ 2088]     0  2088   964120   652326    5788    5189        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.468161] [ 2206]     0  2206  1603893  1223470   10766    9619        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.472760] [ 2248]     0  2248  1218477   887950    7826    7062        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.477369] [ 2360]     0  2360    14788      258      31       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.481750] [ 2362]   497  2362    33842     3697      40       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.486126] [ 2363]   497  2363    34056     4379      41       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.490443] [ 2364]   497  2364    17990     3460      39       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.494838] [ 2365]   497  2365    20188     5658      43       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.499317] [ 2366]   497  2366    34193     4523      41       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.503615] [ 2368]   497  2368    20859     6330      44       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.508025] [ 2369]   497  2369    33359    18829      69       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.512383] [ 2371]   497  2371    56394    41852     114       3        0             0 nginx
Jul 24 15:20:39 kernel: [5568798.516709] [ 6057]   500  6057   229772     3232      64      27        0             0 PM2 v2.10.1: Go
Jul 24 15:20:39 kernel: [5568798.521439] [ 6138]     0  6138   244876    16669     163     121        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.526037] [ 6557]     0  6557   239102     7693      94      51        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.530661] [ 6599]     0  6599   239187     6895      93      49        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.535229] [ 6640]     0  6640   239256     7183      93      51        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.540055] [ 8254]     0  8254   248967    19723     204     143        0             0 node /data/run/
Jul 24 15:20:39 kernel: [5568798.544584] [10923]     0 10923    20123      210      42       3        0             0 sshd
Jul 24 15:20:39 kernel: [5568798.549000] [10925]     0 10925    20123      208      43       3        0             0 sshd
Jul 24 15:20:39 kernel: [5568798.553357] [10929]     0 10929    19040       83      38       3        0             0 sshd
Jul 24 15:20:39 kernel: [5568798.557655] [10931]    74 10931    20123      212      42       3        0             0 sshd
Jul 24 15:20:39 kernel: [5568798.561933] [10932]     0 10932    37712      182      30       3        0             0 crond
Jul 24 15:20:39 kernel: [5568798.566423] Out of memory: Kill process 2390 (mongod) score 573 or sacrifice child
Jul 24 15:20:39 kernel: [5568798.620960] Killed process 2390 (mongod) total-vm:21821360kB, anon-rss:18378772kB, file-rss:0kB, shmem-rss:0kB
Jul 24 15:20:56 dhclient[2024]: XMT: Solicit on eth0, interval 109020ms.
Jul 24 15:21:40 kernel: [5568859.765534] TCP: request_sock_TCP: Possible SYN flooding on port 3802. Sending cookies.  Check SNMP counters.
Jul 24 15:22:06 kernel: [5568885.877099] TCP: request_sock_TCP: Possible SYN flooding on port 3801. Sending cookies.  Check SNMP counters.

结论：

1. mongodb 直接用操作系统的内存管理器来管理内存。而操作系统采用的是LRU算法淘汰冷数据。 
2. mongodb可以用重启服务、调整内核参数以及mongodb内部的语法去清理mongodb对内存的缓存。可能存在的问题是：这几种清理方式都是全部清理，这样的话mongodb的内存缓存就失效了。 
3. mongodb 对内存的使用是可以被监控的，在生产环境中要定时的去监控这些数据。 
4. mongodb 对内存这种占用方式使其尽量的和其他占用内存的业务分开部署，例如memcahe，sphinx，mysql等。 
5. 操作系统中的交换分区swap 如果操作频繁的话，会严重降低系统效率。要解决可以禁用交换分区，以及增加内存以及做分布式。 
6.  生产环境中mongodb所在的主机应该尽量的大内存。

配置升级

当前配置：

当前配置：
CPU：8核
内存：32G
硬盘：300G
带宽：10M

当前负载：
负载: 峰值大概到6.41，十分高，建议增加CPU数量
CPU：目前峰值CPU的%idle占到了将近50%，建议增加CPU数量
内存：数据库占55% 6个游戏服务每个占3%（共18%），总共占用接近80%，建议增加内存
硬盘：2018-07-27日数据库备份大小47G，压缩后2G，暂时够用
网络流量：峰值有12.3M/s，建议增加带宽

总结：建议增加CPU、内存和带宽
方案一：直接升级当前服务器
升级配置如下：
CPU：16核
内存：64G
硬盘：300G
带宽：15~20M

方案二：增加一台新服务器，数据库为单独服务器；注意，新增服务器和原来服务器必须在同一区，内网IP在同一内网网段
数据库服务器（使用当前服务器，配置不变）：
CPU：8核
内存：32G
硬盘：300G
带宽：10M

游戏服务器：
CPU：8核
内存：32G
硬盘：300G
带宽：15~20M

当前负载：

负载: 峰值大概到6.41，十分高了，建议增加CPU数量

sar -q -f /var/log/sa/sa28

12:00:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
03:30:01 PM         2       860      3.86      4.15      4.25
03:40:01 PM         2       874      6.41      4.86      4.44
03:50:01 PM         3       931      4.86      4.65      4.48
04:00:01 PM         4      1055      4.73      4.25      4.28
04:10:01 PM         8      1141      5.66      5.07      4.64
04:20:01 PM         4      1204      3.60      4.11      4.35
04:30:01 PM         3      1244      4.18      3.95      4.10
04:40:01 PM         6      1270      3.47      4.05      4.13
04:50:01 PM         6      1297      3.72      4.18      4.21
05:00:01 PM         9      1310      4.34      4.11      4.22
05:10:01 PM         9      1310      3.55      3.62      3.96
Average:            2       747      2.46      2.43      2.43

输出项说明：
runq-sz：   运行队列的长度（等待运行的进程数）                                      
plist-sz：   进程列表中进程（processes）和线程（threads）的数量                     
ldavg-1：   最后1分钟的系统平均负载（System load average）                          
ldavg-5：   过去5分钟的系统平均负载                                                 
ldavg-15： 过去15分钟的系统平均负载

CPU：目前峰值CPU的%idle占到了将近50%

sar -P ALL -f /var/log/sa/sa28

04:40:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:50:01 PM     all     44.11      0.00      3.81      0.18      0.00     51.90
04:50:01 PM       0     47.12      0.00      3.87      0.23      0.00     48.78
04:50:01 PM       1     44.83      0.00      3.79      0.16      0.00     51.22
04:50:01 PM       2     45.84      0.00      3.70      0.26      0.00     50.20
04:50:01 PM       3     48.81      0.00      3.59      0.17      0.00     47.44
04:50:01 PM       4     42.89      0.00      4.07      0.17      0.00     52.88
04:50:01 PM       5     43.15      0.00      3.71      0.17      0.00     52.97
04:50:01 PM       6     39.87      0.00      3.96      0.16      0.00     56.01
04:50:01 PM       7     40.40      0.00      3.83      0.13      0.00     55.65

04:50:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:00:01 PM     all     42.46      0.00      3.78      0.17      0.00     53.59
05:00:01 PM       0     44.11      0.00      3.98      0.23      0.00     51.68
05:00:01 PM       1     40.85      0.00      3.81      0.14      0.00     55.20
05:00:01 PM       2     43.11      0.00      3.65      0.27      0.00     52.97
05:00:01 PM       3     40.82      0.00      3.82      0.18      0.00     55.17
05:00:01 PM       4     43.38      0.00      3.99      0.15      0.00     52.49
05:00:01 PM       5     47.85      0.00      3.38      0.14      0.00     48.63
05:00:01 PM       6     43.15      0.00      3.76      0.13      0.00     52.96
05:00:01 PM       7     36.41      0.00      3.82      0.13      0.00     59.64

05:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:10:01 PM     all     40.20      0.00      3.65      0.18      0.00     55.97
05:10:01 PM       0     46.56      0.00      3.48      0.22      0.00     49.73
05:10:01 PM       1     38.42      0.00      3.81      0.15      0.00     57.62
05:10:01 PM       2     46.46      0.00      3.32      0.25      0.00     49.97
05:10:01 PM       3     39.75      0.00      3.69      0.20      0.00     56.36
05:10:01 PM       4     36.86      0.00      4.15      0.18      0.00     58.81
05:10:01 PM       5     45.14      0.00      3.29      0.15      0.00     51.41
05:10:01 PM       6     36.50      0.00      3.61      0.14      0.00     59.75
05:10:01 PM       7     31.90      0.00      3.84      0.14      0.00     64.12

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     22.27      0.00      2.23      1.70      0.00     73.80
Average:          0     23.59      0.00      2.33      3.54      0.00     70.54
Average:          1     22.27      0.00      2.20      1.45      0.00     74.08
Average:          2     23.05      0.00      2.18      3.88      0.00     70.88
Average:          3     22.67      0.00      2.17      1.63      0.00     73.53
Average:          4     22.06      0.00      2.35      0.98      0.00     74.61
Average:          5     21.79      0.00      2.20      0.82      0.00     75.19
Average:          6     21.46      0.00      2.20      0.65      0.00     75.69
Average:          7     21.27      0.00      2.19      0.64      0.00     75.90

输出项说明： 
CPU：all表示统计信息为所有CPU的平均值。 
%usr：CPU在用户态执行进程的时间百分比。 
%nice：CPU在用户态模式下，用于nice操作，所占用CPU总时间的百分比。 
%system：CPU处在内核态执行进程的时间百分比。 
%iowait：CPU用于等待I/O操作占用CPU总时间的百分比。 
%steal：管理程序(hypervisor)为另一个虚拟进程提供服务而等待虚拟CPU的百分比。 
%idle：CPU空闲时间百分比。 
1. 若 %iowait 的值过高，表示硬盘存在I/O瓶颈 
2. 若 %idle 的值高但系统响应慢时，有可能是 CPU 等待分配内存，此时应加大内存容量 
3. 若 %idle 的值持续低于1，则系统的 CPU 处理能力相对较低，表明系统中最需要解决的资源是 CPU

top

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19992 root      20   0 1354m 413m 7828 S 88.5  1.3 672:40.39 node /data/run/
15261 mongod    20   0 17.8g  16g 5548 S 48.9 55.1   1206:54 mongod
14433 root      20   0 1884m 973m 6220 R 37.3  3.1 980:02.50 node /data/run/
14353 root      20   0 2175m 1.2g 7324 S 22.0  4.1   1181:22 node /data/run/
14393 root      20   0 2024m 1.1g 6160 S 20.6  3.6   1104:46 node /data/run/
14315 root      20   0 2152m 1.2g 6032 S 19.0  4.0   1217:23 node /data/run/
14473 root      20   0 2009m 1.1g 7168 S 17.0  3.5   1032:29 node /data/run/
13360 nginx     20   0  235m 179m 1140 S  5.3  0.6 305:41.86 nginx
13361 nginx     20   0  142m  86m  628 S  3.3  0.3 233:03.44 nginx
23415 root      20   0 1035m 106m 7312 S  2.3  0.3  95:20.07 node /data/run/
 2342 root      20   0 1361m 5420  884 S  0.7  0.0 176:42.46 redis-server

内存：数据库占55% 6个游戏服务每个占3%（共18%），总共占用接近80%，建议增加内存

sar -r -f /var/log/sa/sa28

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
04:20:01 AM    412732  31718124     98.72    170364  10067084  24350868     75.79
04:30:01 AM    334156  31796700     98.96    171408  10116788  24350544     75.79
04:40:01 AM    272740  31858116     99.15    172508  10164292  24355892     75.80
04:50:01 AM    357628  31773228     98.89    173500  10047584  24404816     75.95
05:00:01 AM    309484  31821372     99.04    174548  10096140  24403892     75.95
05:10:01 AM    290952  31839904     99.09    175520  10083396  24452340     76.10
05:20:01 AM    313864  31816992     99.02    176512  10031388  24468580     76.15
05:30:01 AM    282832  31848024     99.12    177296  10021748  24467008     76.15
05:40:01 AM    273124  31857732     99.15    178060  10039852  24492204     76.23
05:50:01 AM    301356  31829500     99.06    178796   9990804  24496000     76.24
06:00:01 AM    277256  31853600     99.14    179576   9982468  24519824     76.31
Average:       722083  31408773     97.75    186464   8734913  25284052     78.69

输出项说明： 
kbmemfree：空闲物理内存量；
kbmemused：使用中的物理内存量；
%memused：物理内存量使用率；
kbbuffers：内核中作为缓冲区使用的物理内存容量；
kbcacheed：内核中作为缓存使用的物理内存容量；
kbswpfree：交换区的空闲容量；
kbswpused：使用中的交换区容量；

可用内存=free+buffers+cached   412732+170364+10067084=10.65G
已用内存=userd-buffers-cached  31718124-170364-10067084 = 21.48G

页面交换：

sar -W -f /var/log/sa/sa28

12:00:01 AM  pswpin/s pswpout/s
12:10:01 AM      0.00      0.00
12:20:01 AM      0.00      0.00
12:30:01 AM      0.00      0.00
12:40:01 AM      0.00      0.00
12:50:01 AM      0.00      0.00
Average:         0.00      0.00

I/O和传送速率：

sar -b -f /var/log/sa/sa28

12:00:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
09:50:01 PM     20.57      0.05     20.52      1.00    339.30
10:00:01 PM     38.17     18.59     19.59    281.84    745.23
10:10:01 PM   2804.77   2758.81     45.96  39721.24   6959.51
10:20:01 PM    158.94    138.17     20.76   2287.52   1612.53
10:30:01 PM     17.17      1.83     15.33     55.23    255.61
10:40:01 PM     16.78      1.55     15.23     54.01    254.35
Average:       128.15     72.36     55.80   1031.67   1199.53

输出项说明：
tps：     每秒钟物理设备的 I/O 传输总量                    
rtps:    每秒钟从物理设备读入的数据总量                  
wtps:    每秒钟向物理设备写入的数据总量                  
bread/s: 每秒钟从物理设备读入的数据量，单位为 块/s    
bwrtn/s: 每秒钟向物理设备写入的数据量，单位为 块/s

sar -d -f /var/log/sa/sa28

12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:10:01 AM  dev259-0      0.92      0.07     21.76     23.61      0.00      0.19      0.19      0.02
12:10:01 AM  dev259-1     13.10      1.80    177.23     13.66      0.01      0.90      0.29      0.38
12:20:01 AM  dev259-0      0.83      0.00     22.27     26.69      0.00      0.26      0.23      0.02
12:20:01 AM  dev259-1     13.48      2.92    181.07     13.65      0.02      1.36      0.29      0.39
12:30:01 AM  dev259-0      0.89      0.00     23.87     26.69      0.00      0.19      0.19      0.02
12:30:01 AM  dev259-1     15.79      3.93    209.18     13.50      0.01      0.92      0.27      0.43
12:40:01 AM  dev259-0      0.97      0.68     24.38     25.85      0.00      0.22      0.22      0.02
12:40:01 AM  dev259-1   2218.39  30226.67   3315.07     15.12      1.23      0.56      0.32     70.42
12:50:01 AM  dev259-0      0.92      0.54     25.46     28.12      0.00      0.20      0.17      0.02
12:50:01 AM  dev259-1   2133.46  29776.96   3717.93     15.70      1.10      0.52      0.30     63.12
01:00:03 AM  dev259-0      0.85      0.00     26.41     30.96      0.00      0.16      0.16      0.01
01:00:03 AM  dev259-1    559.56   7719.08   2357.59     18.01      4.69      8.35      1.32     74.02
01:10:01 AM  dev259-0      0.90      0.03     27.69     30.78      0.00      0.19      0.18      0.02
01:10:01 AM  dev259-1    106.25   1138.62   1012.30     20.24      7.05     66.49      9.35     99.30
01:20:01 AM  dev259-0      0.87      0.00     29.84     34.12      0.00      0.22      0.22      0.02
01:20:01 AM  dev259-1    106.07   1060.89   1878.07     27.71      7.13     67.23      9.23     97.90
01:30:01 AM  dev259-0      0.90      0.00     30.81     34.04      0.00      0.22      0.22      0.02
01:30:01 AM  dev259-1    105.17   1210.76    755.42     18.70      6.36     60.40      9.51    100.06
01:40:01 AM  dev259-0      0.91      0.00     31.29     34.51      0.00      0.18      0.18      0.02
01:40:01 AM  dev259-1    104.93   1140.88    764.34     18.16      6.89     65.73      9.53     99.98
Average:     dev259-0      1.14      1.33     53.40     48.17      0.00      0.26      0.21      0.02
Average:     dev259-1    127.02   1030.34   1146.13     17.14      0.67      5.28      0.81     10.30

输出项说明：
await表示平均每次设备I/O操作的等待时间（以毫秒为单位）。 
svctm表示平均每次设备I/O操作的服务时间（以毫秒为单位）。
%util表示一秒中有百分之几的时间用于I/O操作。 
对以磁盘IO性能，一般有如下评判标准：
正常情况下svctm应该是小于await值的，而svctm的大小和磁盘性能有关，CPU、内存的负荷也会对svctm值造成影响，过多的请求也会间接的导致svctm值的增加。
await值的大小一般取决与svctm的值和I/O队列长度以及I/O请求模式，如果svctm的值与await很接近，表示几乎没有I/O等待，磁盘性能很好，如果await的值远高于svctm的值，则表示I/O队列等待太长，系统上运行的应用程序将变慢，此时可以通过更换更快的硬盘来解决问题。
%util项的值也是衡量磁盘I/O的一个重要指标，如果%util接近100%，表示磁盘产生的I/O请求太多，I/O系统已经满负荷的在工作，该磁盘可能存在瓶颈。长期下去，势必影响系统的性能，可以通过优化程序或者通过更换更高、更快的磁盘来解决此问题。

网络流量：峰值有12.3M/s

sar -n DEV -f /var/log/sa/sa28

12:30:01 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
04:20:01 PM        lo  13999.38  13999.38  12268.18  12268.18      0.00      0.00      0.00
04:30:01 PM      eth0   6133.45   5705.68    679.22    808.76      0.00      0.00      0.00
04:30:01 PM        lo  14167.40  14167.40  12258.86  12258.86      0.00      0.00      0.00
04:40:01 PM      eth0   6129.91   5690.51    681.00    817.57      0.00      0.00      0.00
04:40:01 PM        lo  14397.23  14397.23  12307.96  12307.96      0.00      0.00      0.00
04:50:01 PM      eth0   6195.80   5794.20    696.24    837.08      0.00      0.00      0.00
04:50:01 PM        lo  14805.51  14805.51  12359.29  12359.29      0.00      0.00      0.00
05:00:01 PM      eth0   6179.53   5756.83    700.91    836.21      0.00      0.00      0.00
05:00:01 PM        lo  14514.45  14514.45  11935.72  11935.72      0.00      0.00      0.00
05:10:01 PM      eth0   6192.71   5686.02    703.21    826.26      0.00      0.00      0.00
05:10:01 PM        lo  14172.06  14172.06  11091.33  11091.33      0.00      0.00      0.00
05:20:01 PM      eth0   6367.97   5781.26    724.36    852.73      0.00      0.00      0.00
05:20:01 PM        lo  14447.40  14447.40  11268.92  11268.92      0.00      0.00      0.00
05:30:01 PM      eth0   6615.19   5922.98    758.45    885.70      0.00      0.00      0.00
05:30:01 PM        lo  14979.11  14979.11  11289.22  11289.22      0.00      0.00      0.00
Average:         eth0   4023.18   3584.38    454.18    533.71      0.00      0.00      0.00
Average:           lo   8982.72   8982.72   6130.97   6130.97      0.00      0.00      0.00

IFACE：LAN接口
rxpck/s：每秒钟接收的数据包
txpck/s：每秒钟发送的数据包
rxbyt/s：每秒钟接收的字节数
txbyt/s：每秒钟发送的字节数
rxcmp/s：每秒钟接收的压缩数据包
txcmp/s：每秒钟发送的压缩数据包
rxmcst/s：每秒钟接收的多播数据包

linux 通过ssh连接 linux

最近公司网址被服务器屏蔽，通过登录其他服务器再连目标服务器

拷贝pem文件到中转服务器

更改pem文件权限为600

1	chmod 600 xxxx.pem

ssh连接服务器

1	ssh user@192.168.9.3 -i xxxx.pem

egg运维

1. 设置root密码

1	sudo passwd root

2. 到手服务器升级

1
2
3

su
yum update
yum -y install gcc make gcc-c++ openssl-devel wget

3. mongodb

由于是美国服务器，有些下载mongodb的网址访问不了，使用如下地址

1	vim /etc/yum.repos.d/mongodb-org-3.2.repo

内容：

[mongodb-org-3.2]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/7/mongodb-org/3.2/x86_64/
gpgcheck=0
enabled=1

安装：

1	yum install mongodb-org -y

启动一次：

1	service mongod start

再停止：

1	service mongod stop

修改配置文件，更改数据库文件目录，放到大的硬盘空间

1	vim /etc/mongod.conf

改dbPath

1	dbPath: /data/db/mongo/mongo

复制mongo至新位置

1	rsync -av /var/lib/mongo /data/db/mongo/

备份原来文件

1	mv /var/lib/mongo /var/lib/mongo-bk

修改mongo的service启动脚本

vim /etc/init.d/mongod

daemon --user "$MONGO_USER" --check $mongod "$NUMACTL $mongod $OPTIONS >/dev/null 2>&1"
改成
daemon $mongod $OPTIONS

4. nodejs

wget https://npm.taobao.org/mirrors/node/v9.10.1/node-v9.10.1.tar.gz
tar -zvxf node-v9.10.1.tar.gz
cd node-v9.10.1
./configure
make
make install
node -v
npm -v

5. 建立部署应用后台脚本

#!/bin/sh
SERVER=/data/nodejs/
YMDAY=`date +%Y-%m-%d-%H-%M-%S`
echo $YMDAY
cd $SERVER
pwd
rm -rf xxxxx.zip
mv xxxxx xxxxx-$YMDAY
wget http://xxxxx/xxxxx.zip
unzip xxxxx.zip
cp config.default.js xxxxx/config/
cp package.json xxxxx/
cd xxxxx
pwd
npm i
npm stop
npm start
tail -f /root/logs/xxxxx/xxxxx-web.log

6. 安装监控iftop

yum -y install ncurses-devel libpcap-devel
wget http://www.ex-parrot.com/~pdw/iftop/download/iftop-0.17.tar.gz
tar xvfvz iftop-0.17.tar.gz
cd iftop-0.17
./configure
make
make install

7. 配置限制等其他

增加1024M的swap交换文件

创建并激活swap交换文件：
cd /var/
dd if=/dev/zero of=swapfile bs=4096 count=262144
/sbin/mkswap swapfile
/sbin/swapon swapfile

加到fstab文件中让系统引导时自动启动：
vi /etc/fstab
在末尾增加以下内容：
/var/swapfile     swap swap     defaults     0  0

打开的文件句柄的数量限制

查看
ulimit -n
修改文件
vim /etc/security/limits.conf

内容为：

* soft nofile 1048576
* hard nofile 1048576
@root  soft nofile 1048576
@root  hard nofile 1048576

查看：

1 2	cat /proc/sys/fs/file-max cat /proc/sys/fs/nr_open

重启系统

8. 网络代理工具安装脚本

https://teddysun.com/357.html

本脚本适用环境：
系统支持：CentOS
内存要求：≥128M
日期：2018 年 06 月 01 日

关于本脚本：
一键安装 libev 版的 网络代理工具 最新版本。该版本的特点是内存占用小（600k左右），低 CPU 消耗，甚至可以安装在基于 OpenWRT 的路由器上。
友情提示：如果你有问题，请先参考这篇《网络代理工具 Troubleshooting》后再问。


默认配置：
服务器端口：自己设定（如不设定，默认从 9000-19999 之间随机生成）
密码：自己设定（如不设定，默认为 teddysun.com）
加密方式：自己设定（如不设定，默认为 aes-256-gcm）

网络代理工具 for Windows 客户端下载：
https://github.com/network-proxy-tool/network-proxy-tool-windows/releases

使用方法：
使用root用户登录，运行以下命令：

wget --no-check-certificate -O proxy-tool.sh https://raw.githubusercontent.com/network-proxy-tool/install/master/proxy-tool-libev.sh
chmod +x proxy-tool.sh
./proxy-tool.sh 2>&1 | tee proxy-tool.log
安装完成后，脚本提示如下：

Congratulations, 网络代理工具 server install completed!
Your Server IP        :your_server_ip
Your Server Port      :your_server_port
Your Password         :your_password
Your Encryption Method:your_encryption_method

Welcome to visit:https://teddysun.com/357.html
Enjoy it!
卸载方法：
使用 root 用户登录，运行以下命令：

./proxy-tool.sh uninstall
其他事项：
客户端配置的参考链接：https://teddysun.com/339.html

安装完成后即已后台启动 网络代理工具 ，运行：

/etc/init.d/proxy-tool status
可以查看进程是否启动。
本脚本安装完成后，会将 网络代理工具 加入开机自启动。

使用命令：

启动：/etc/init.d/proxy-tool start
停止：/etc/init.d/proxy-tool stop
重启：/etc/init.d/proxy-tool restart
查看状态：/etc/init.d/proxy-tool status

更多版本网络代理工具安装：

网络代理工具R 版一键安装脚本（CentOS，Debian，Ubuntu）
网络代理工具 Python 版一键安装脚本（CentOS，Debian，Ubuntu）
Debian 下 网络代理工具 一键安装脚本
网络代理工具-go 一键安装脚本（CentOS，Debian，Ubuntu）
网络代理工具 一键安装脚本（四合一）

B运维

1. 服务器配置

Amazon Linux AMI（Linux version 4.9.76-3.78.amzn1.x86_64）(Red Hat 7.2.1-2)
Mem 16G
CPU 物理 1；逻辑 4
硬盘 300G

2. 初始化

# 设置root密码
sudo passwd root
# 查询连接数
netstat -nat|grep ESTABLISHED|wc -l
# root登录
su
# 更新
yum -y update

1	yum -y install gcc make gcc-c++ wget pcre-devel zlib-devel openssl openssl-devel

3. 安装mongodb

1 2	# yum 安装 vim /etc/yum.repos.d/mongodb-org-4.0.repo

内容

[mongodb-org-4.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/4.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc

安装

yum install -y mongodb-org
# 自启动
# chkconfig mongod on
# 建立目录
mkdir /data/log
mkdir /data/log/mongo
mkdir /data/db
# 分配权限
chown -R mongod.mongod /data/log
chown -R mongod.mongod /data/log/mongo
chown -R mongod.mongod /data/db

复制mongo至新位置

1	rsync -av /var/lib/mongo /data/db

更改配置

vim /etc/mongod.conf
# 改日志文件路径
  path: /data/log/mongo/mongod.log
# 改数据路径
  dbPath: /data/db/mongo

4. 安装Nodejs

wget https://npm.taobao.org/mirrors/node/v9.10.1/node-v9.10.1.tar.gz
tar -zvxf node-v9.10.1.tar.gz
cd node-v9.10.1
./configure
make
make install
node -v
npm -v

5. redis

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make
yum install -y tcl
make test
cd src/
cp redis-server /usr/local/bin/
cp redis-cli /usr/local/bin/
cp redis-sentinel /usr/local/bin/
mkdir /etc/redis
cd ..
cp redis.conf /etc/redis/6379.conf
vim /etc/redis/6379.conf

改配置

mkdir /data/redis
mkdir /data/redis/run
mkdir /data/redis/log
mkdir /data/redis/db

vim /etc/redis.conf
# 修改
daemonize yes
pidfile /data/redis/run/redis_6379.pid
logfile /data/redis/log/redis_6379.log
dir /data/redis/db

启动

1 2	redis-server /etc/redis/6379.conf redis-cli

停止

1	redis-cli shutdown

6. 安装 nginx

安装依赖的库：

1 2	yum -y install pcre-devel zlib-devel openssl openssl-devel yum install nginx

加入开机自动启动：

1	chkconfig --level 35 nginx on

改配置

mkdir /data/log/nginx
vim /etc/nginx/nginx.conf
# 修改
error_log /data/log/nginx/error.log;
events {
    use   epoll;  #epoll是多路复用IO(I/O Multiplexing)中的一种方式,但是仅用于linux2.6以上内核,可以大大提高nginx的性能
    worker_connections  10240; #单个后台worker process进程的最大并发链接数
}
    gzip  on;
    upstream apibackend {
        server 127.0.0.1:8001 weight=1;
        server 127.0.0.1:8002 weight=1;
        server 127.0.0.1:8003 weight=1;
    }
    server {
        listen       7000; #侦听端口
        server_name  www.xx.com; #定义使用www.xx.com访问

    server 127.0.0.1:8003 weight=1;
}
server {
    listen       7000; #侦听端口
    server_name  www.xx.com; #定义使用www.xx.com访问
    #默认请求
    location / {
        root   html; #定义服务器的默认网站根目录位置
        index  index.html index.htm; #定义首页索引文件的名称
    #请求转向mysvr 定义的服务器列表
        proxy_pass http://apibackend;
        proxy_redirect  off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        #跟代理服务器连接的超时时间，必须留意这个time out时间不能超过75秒，当一台服务器当掉时，过10秒转发到另外一台服务器。
        proxy_connect_timeout 10;
    }
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   html;
    }
}



启动

service nginx start




* 工具

yum -y install ncurses-devel libpcap-devel
wget http://www.ex-parrot.com/~pdw/iftop/download/iftop-0.17.tar.gz
tar xvfvz iftop-0.17.tar.gz
cd iftop-0.17
./configure
make
make install
iftop
iftop -i eth0 -f ‘port domain’


* 配置限制等其他

增加1024M的swap交换文件

创建并激活swap交换文件：
cd /var/
dd if=/dev/zero of=swapfile bs=4096 count=262144
/sbin/mkswap swapfile
/sbin/swapon swapfile

加到fstab文件中让系统引导时自动启动：
vi /etc/fstab
在末尾增加以下内容：
/var/swapfile swap swap defaults 0 0

1
2
3


检查swapfile是否启动

free -m

比较一下之前的内存，看swap是否增加

打开的文件句柄的数量限制

==特别提醒==

==Waiting for the pending transfer to complete错误==

http://blog.51cto.com/12824426/2060594

==有一种意外情况，如果没有注意修改系统默认允许的最大值，在limits.conf中设置的参数大于系统默认值，退出终端后，你会发现ssh无法链接的悲剧，此时如果你还有未关闭的终端链接，那恭喜你还有拯救的余地，修改sshd的配置文件。==

vim /etc/ssh/sshd_config

1 2	UsePAM yes 将这里的yes改为no 重启sshd服务

#systemctl restart sshd.service


此时可以链接终端了，调整系统内核允许的最大值，再改回sshd的配置。


查看：

cat /proc/sys/fs/file-max
3202602
cat /proc/sys/fs/nr_open
1048576

```
查看
ulimit -n
不重启生效
ulimit -n 1048576
修改文件重启生效
vim /etc/security/limits.conf

内容为：

* soft nofile 1048576
* hard nofile 1048576
@root  soft nofile 1048576
@root  hard nofile 1048576

vim /etc/sysctl.conf
# 添加
net.core.somaxconn=32768

执行这个命令生效
sudo sysctl -p

重启系统

启动脚本

#!/bin/sh

name="words-server"

start() {
    echo -n $"Starting $name: "
    cd /data/nodejs/words-service/
    pm2 start app.js --name words-service -e logs/log.err -o logs/log.out
    cd /data/nodejs/words-socketio/
    npm start
    retval=$?
    echo -n $"retval $retval: "
    return $retval
}

stop() {
    echo -n $"Stopping $name: "
    cd /data/nodejs/words-service/
    pm2 delete words-service
    cd /data/nodejs/words-socketio/
    npm stop
    retval=$?
    echo -n $"retval $retval: "
    return $retval
}

restart() {
    stop
    start
}

case "$1" in
    start)
        $1
        ;;
    stop)
        $1
        ;;
    restart)
        $1
        ;;
    *)
        echo $"Usage: $0 {start|stop|restart}"
        exit 2
esac
exit $?

L2运维

1. 服务器配置

Amazon Linux AMI（Linux version 4.9.76-3.78.amzn1.x86_64）(Red Hat 7.2.1-2)
Mem 32G
CPU 物理 1；逻辑 4
硬盘 300G

2. 初始化

# 设置root密码
sudo passwd root
# root登录
su
# 更新
yum -y update
yum -y install gcc make gcc-c++ wget pcre-devel zlib-devel openssl openssl-devel

3. 挂载硬盘

https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/ebs-using-volumes.html

lsblk
sudo file -s /dev/nvme1n1
sudo mkfs -t ext4 /dev/nvme1n1
sudo mkdir /data
sudo mount /dev/nvme1n1 /data
sudo cp /etc/fstab /etc/fstab.orig
df -h
sudo file -s /dev/nvme1n1
ls -al /dev/disk/by-uuid/
sudo vi /etc/fstab
#UUID=8008d8f9-f8fd-4008-88c5-cc5fddc79b40       /data   ext4    defaults,nofail        0       2
df -h
sudo mount -a

4. 安装 nginx

安装依赖的库：

1
2
3

yum -y install pcre-devel zlib-devel openssl openssl-devel
rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm
yum install nginx

加入开机自动启动：

1	chkconfig --level 35 nginx on

改配置

mkdir /data/log/nginx
vim /etc/nginx/nginx.conf
# 修改
# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user nginx;
worker_processes 8;
error_log /data/log/nginx/error.log;
pid /var/run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    use   epoll;
    worker_connections 20480;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  /data/log/nginx/access.log  main;

    sendfile            on;
    #tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   3000;
    types_hash_max_size 20480;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    #include /etc/nginx/conf.d/*.conf;

    #index   index.html index.htm;

    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
    }

    upstream wsbackend {
        hash $remote_addr consistent; # 可以根据客户端ip映射
        server 127.0.0.1:3801 weight=1;
        server 127.0.0.1:3802 weight=1;
        server 127.0.0.1:3803 weight=1;
        server 127.0.0.1:3804 weight=1;
        server 127.0.0.1:3805 weight=1;
        server 127.0.0.1:3806 weight=1;
    }

    server {
        listen       38080;
        server_name  127.0.0.1;

        location / {
            proxy_pass http://wsbackend;
            proxy_set_header Host $host:$server_port;
            proxy_http_version 1.1;
            proxy_connect_timeout 60s; #配置点1
            proxy_read_timeout 3000s; #配置点2，如果没效，可以考虑这个时间配置长一点
            proxy_send_timeout 120s; #配置点3
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }
    }
}

启动

1	service nginx start

5. 安装Nodejs

wget https://npm.taobao.org/mirrors/node/v9.10.1/node-v9.10.1.tar.gz
tar -zvxf node-v9.10.1.tar.gz
cd node-v9.10.1
./configure
make
make install
node -v
npm -v
npm install -g pm2

6. 强制踢掉登录用户

# 查看用户
w
# 强制踢人
命令格式：pkill -kill -t tty
解释：
pkill -kill -t 　踢人命令
tty　所踢用户的tty
比如： pkill -kill -t pts/2

问题汇总

1. 流量暴增

问题描述

这几天发现访问国外服务器速度巨慢，监控流量发现已经达到200M/s

1	iftop -i eth0

看来是带宽被占用导致的访问速度慢，初步想到的解决方案有：

a. 增加个网卡，让业务服和数据服之间的数据传递通过新网卡来传。

优点是解决问题速度快，且不用动业务，只需要改访问数据服的配置即可；缺点是治标不治本，最终还是要解决流量太大的问题。目前先采用这个解决方案，让业务不受影响地临时解决流量问题，后面再排查逐步解决本质问题。

b. 查出流量暴增的关键点，并优化解决，并排查其他业务类似隐患。

查看版本日志，发现最近增加了数据埋点，用于运营人员分析用户行为。review代码发现由于用户量大，埋点代码查询用户数据返回了太多用不到的数据，且未分类和过滤用户数据，进行了全用户跟踪。

解决方案有：

查询只返回必要的字段，其他业务也同步排查同类问题。

业务服增加缓存，保存同步用户关键数据供业务使用，减少与数据服直接的数据传输，但要关注缓存的击穿和避免雪崩等问题，需要认真分析设计，实际验证和上线的平滑过渡。避免影响用户的正常使用。

数据服的横向扩展等

c. 打点业务连接单独的分析数据服，与业务使用的数据服分开。

这个方案比较简单，只需连接不同数据服即可。

技术运维

websocket 服务运维

0. 网络代理工具

1. 服务器配置

2. 系统负载查看

查看端口实际连接数量

top 简单查看

查看程序占用实际内存

iftop 查看实时网络流量

sar查看历史负载

sar –r 查看内存使用情况

sar –b 查看I/O和传送速率的统计信息

sar –u 查看CPU使用率

网卡流量查看命令

3. 活动相关

活动准备

客户端数据

稳定期内数据

服务器数据

egg运维

1. 设置root密码

2. 到手服务器升级

3. mongodb

4. nodejs

5. 建立部署应用后台脚本

6. 安装监控iftop

7. 配置限制等其他

8. 网络代理工具安装脚本

B运维

1. 服务器配置

2. 初始化

3. 安装mongodb

4. 安装Nodejs

5. redis

6. 安装 nginx

vim /etc/ssh/sshd_config

L2运维

1. 服务器配置

2. 初始化

3. 挂载硬盘

4. 安装 nginx

5. 安装Nodejs

6. 强制踢掉登录用户

问题汇总

问题汇总

1. 流量暴增

问题描述

a. 增加个网卡，让业务服和数据服之间的数据传递通过新网卡来传。

b. 查出流量暴增的关键点，并优化解决，并排查其他业务类似隐患。

c. 打点业务连接单独的分析数据服，与业务使用的数据服分开。