阿里云
阿里云多端小程序中小企业获客首选
发表主题 回复主题
  • 1027阅读
  • 0回复

[云安全技术和产品专区 ]云监控中事件监控最佳实践

级别: 论坛粉丝
发帖
1227
云币
2325
* bK@A2`  
应用场景 ZVL0S{V-mh  
服务在运行过程中,难免出现异常情况,有些异常通过重试等手段可以动恢复,有些则不能,严重异常甚至会中断客户业务。所以我们需要一个系统来记录这些异常,并且在满足特定的条件时触发报警。传统方法是打印文件日志,通过收集日志到特定的系统,例如开源的ELK(ElasticSearch, Logstash, Kibana)中。 这些开源的系统往往是由多个复杂的分布式系统组成,自行维护面临着技术门槛高、成本高的问题。 云监控提供了一个事件监控功能,能很好解决这些问题。 E'}$'n?:  
下面通过几个例子简单说明下如何使用事件监控功能。 &3efJ?8  
"lt[)3*  
实战案例 pOXEM1"2A  
第一步:上报异常 195(Kr<5$  
事件监控提供了JAVA SDK和Open API两种上报数据的方式,这里介绍通过JAVA SDK 上报数据。 l/5/|UE9  
Step1 添加 Maven 依赖
  1. [backcolor=transparent]<dependency>
  2. [backcolor=transparent]    [backcolor=transparent]<groupId>[backcolor=transparent]com.aliyun.openservices[backcolor=transparent]</groupId>
  3. [backcolor=transparent]    [backcolor=transparent]<artifactId>[backcolor=transparent]aliyun-cms[backcolor=transparent]</artifactId>
  4. [backcolor=transparent]    [backcolor=transparent]<version>[backcolor=transparent]0.1.2[backcolor=transparent]</version>
  5. [backcolor=transparent]</dependency>
eAj}/2y"  
Step2 初始化SDK
  1. [backcolor=transparent]// 这里的118代表云监控的应用分组ID,可以以应用的角度来对事件归类, 可以到云监控应用分组列表中查看分组的ID。
  2. [backcolor=transparent]CMSClientInit[backcolor=transparent].[backcolor=transparent]groupId [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]118L[backcolor=transparent];
  3. [backcolor=transparent]// 这里的地址是事件系统上报的入口,目前是公网地址。accesskey和secretkey用于身份识别。
  4. [backcolor=transparent]CMSClient[backcolor=transparent] c [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]CMSClient[backcolor=transparent]([backcolor=transparent]"https://metrichub-cms-cn-hangzhou.aliyuncs.com"[backcolor=transparent],[backcolor=transparent] accesskey[backcolor=transparent],[backcolor=transparent] secretkey[backcolor=transparent]);
qA42f83  
Step3 考虑是否异步上报数据 <|@9]>z  
云监控事件默认提供了同步的上报策略。 好处是编写代码简单、 保证每次上报事件的可靠,不丢失数据。 slbV[xR  
但是同步策略也带来一些问题。因为要在业务代码中嵌入事件上报代码,如果网络出现波动,可能会出现阻塞代码执行,影响正常的业务。有很多业务场景并不需要100%要求事件可靠不丢,所以我们需要一个简单的异步上报封装。将事件写到一个LinkedBlockingQueue中,然后通过ScheduledExecutorService异步在后台批量上报。
  1. [backcolor=transparent]//初始化queue与Executors:
  2. [backcolor=transparent]private[backcolor=transparent] [backcolor=transparent]LinkedBlockingQueue[backcolor=transparent]<[backcolor=transparent]EventEntry[backcolor=transparent]>[backcolor=transparent] eventQueue [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]LinkedBlockingQueue[backcolor=transparent]<[backcolor=transparent]EventEntry[backcolor=transparent]>([backcolor=transparent]10000[backcolor=transparent]);
  3. [backcolor=transparent]private[backcolor=transparent] [backcolor=transparent]ScheduledExecutorService[backcolor=transparent] schedule [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]Executors[backcolor=transparent].[backcolor=transparent]newSingleThreadScheduledExecutor[backcolor=transparent]();
  4. [backcolor=transparent]//上报事件:
  5. [backcolor=transparent]//每一个事件都包含事件的名称与事件的内容,名称用于识别事件,内容是事件的详细信息,支持全文搜索。
  6. [backcolor=transparent]public[backcolor=transparent] [backcolor=transparent]void[backcolor=transparent] put[backcolor=transparent]([backcolor=transparent]String[backcolor=transparent] name[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent] content[backcolor=transparent])[backcolor=transparent] [backcolor=transparent]{
  7. [backcolor=transparent]    [backcolor=transparent]EventEntry[backcolor=transparent] event [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]EventEntry[backcolor=transparent]([backcolor=transparent]name[backcolor=transparent],[backcolor=transparent] content[backcolor=transparent]);
  8. [backcolor=transparent]    [backcolor=transparent]// 这里事件队列满后将直接丢弃,可以根据自己的情况调整这个策略。
  9. [backcolor=transparent]    [backcolor=transparent]boolean[backcolor=transparent] b [backcolor=transparent]=[backcolor=transparent] eventQueue[backcolor=transparent].[backcolor=transparent]offer[backcolor=transparent]([backcolor=transparent]event[backcolor=transparent]);
  10. [backcolor=transparent]    [backcolor=transparent]if[backcolor=transparent] [backcolor=transparent](![backcolor=transparent]b[backcolor=transparent])[backcolor=transparent] [backcolor=transparent]{
  11. [backcolor=transparent]        logger[backcolor=transparent].[backcolor=transparent]warn[backcolor=transparent]([backcolor=transparent]"事件队列已满,丢弃事件:{}"[backcolor=transparent],[backcolor=transparent] event[backcolor=transparent]);
  12. [backcolor=transparent]    [backcolor=transparent]}
  13. [backcolor=transparent]}
  14. [backcolor=transparent]//异步提交事件,初始化定时任务,每秒执行run方法批量上报事件。可以根据自己的情况调整上报间隔。
  15. [backcolor=transparent]schedule[backcolor=transparent].[backcolor=transparent]scheduleAtFixedRate[backcolor=transparent]([backcolor=transparent]this[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]1[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]1[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]TimeUnit[backcolor=transparent].[backcolor=transparent]SECONDS[backcolor=transparent]);
  16. [backcolor=transparent]public[backcolor=transparent] [backcolor=transparent]void[backcolor=transparent] run[backcolor=transparent]()[backcolor=transparent] [backcolor=transparent]{
  17. [backcolor=transparent]    [backcolor=transparent]do[backcolor=transparent] [backcolor=transparent]{
  18. [backcolor=transparent]        batchPut[backcolor=transparent]();
  19. [backcolor=transparent]    [backcolor=transparent]}[backcolor=transparent] [backcolor=transparent]while[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]this[backcolor=transparent].[backcolor=transparent]eventQueue[backcolor=transparent].[backcolor=transparent]size[backcolor=transparent]()[backcolor=transparent] [backcolor=transparent]>[backcolor=transparent] [backcolor=transparent]500[backcolor=transparent]);
  20. [backcolor=transparent]}
  21. [backcolor=transparent]private[backcolor=transparent] [backcolor=transparent]void[backcolor=transparent] batchPut[backcolor=transparent]()[backcolor=transparent] [backcolor=transparent]{
  22. [backcolor=transparent]    [backcolor=transparent]// 从队列中取出99条事件,用于批量上报
  23. [backcolor=transparent]    [backcolor=transparent]List[backcolor=transparent]<[backcolor=transparent]CustomEvent[backcolor=transparent]>[backcolor=transparent] events [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]ArrayList[backcolor=transparent]<[backcolor=transparent]CustomEvent[backcolor=transparent]>();
  24. [backcolor=transparent]    [backcolor=transparent]for[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]int[backcolor=transparent] i [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]0[backcolor=transparent];[backcolor=transparent] i [backcolor=transparent]<[backcolor=transparent] [backcolor=transparent]99[backcolor=transparent];[backcolor=transparent] i[backcolor=transparent]++)[backcolor=transparent] [backcolor=transparent]{
  25. [backcolor=transparent]        [backcolor=transparent]EventEntry[backcolor=transparent] e [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]this[backcolor=transparent].[backcolor=transparent]eventQueue[backcolor=transparent].[backcolor=transparent]poll[backcolor=transparent]();
  26. [backcolor=transparent]        [backcolor=transparent]if[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]e [backcolor=transparent]==[backcolor=transparent] [backcolor=transparent]null[backcolor=transparent])[backcolor=transparent] [backcolor=transparent]{
  27. [backcolor=transparent]            [backcolor=transparent]break[backcolor=transparent];
  28. [backcolor=transparent]        [backcolor=transparent]}
  29. [backcolor=transparent]        events[backcolor=transparent].[backcolor=transparent]add[backcolor=transparent]([backcolor=transparent]CustomEvent[backcolor=transparent].[backcolor=transparent]builder[backcolor=transparent]().[backcolor=transparent]setContent[backcolor=transparent]([backcolor=transparent]e[backcolor=transparent].[backcolor=transparent]getContent[backcolor=transparent]()).[backcolor=transparent]setName[backcolor=transparent]([backcolor=transparent]e[backcolor=transparent].[backcolor=transparent]getName[backcolor=transparent]()).[backcolor=transparent]build[backcolor=transparent]());
  30. [backcolor=transparent]    [backcolor=transparent]}
  31. [backcolor=transparent]    [backcolor=transparent]if[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]events[backcolor=transparent].[backcolor=transparent]isEmpty[backcolor=transparent]())[backcolor=transparent] [backcolor=transparent]{
  32. [backcolor=transparent]        [backcolor=transparent]return[backcolor=transparent];
  33. [backcolor=transparent]    [backcolor=transparent]}
  34. [backcolor=transparent]    [backcolor=transparent]// 批量上报事件到云监控, 这里并未重试, SDK也没有重试, 如果对事件可靠度要求高需要自己加重试策略。
  35. [backcolor=transparent]    [backcolor=transparent]try[backcolor=transparent] [backcolor=transparent]{
  36. [backcolor=transparent]        [backcolor=transparent]CustomEventUploadRequestBuilder[backcolor=transparent] builder [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]CustomEventUploadRequest[backcolor=transparent].[backcolor=transparent]builder[backcolor=transparent]();
  37. [backcolor=transparent]        builder[backcolor=transparent].[backcolor=transparent]setEventList[backcolor=transparent]([backcolor=transparent]events[backcolor=transparent]);
  38. [backcolor=transparent]        [backcolor=transparent]CustomEventUploadResponse[backcolor=transparent] response [backcolor=transparent]=[backcolor=transparent] cmsClient[backcolor=transparent].[backcolor=transparent]putCustomEvent[backcolor=transparent]([backcolor=transparent]builder[backcolor=transparent].[backcolor=transparent]build[backcolor=transparent]());
  39. [backcolor=transparent]        [backcolor=transparent]if[backcolor=transparent] [backcolor=transparent](![backcolor=transparent]"200"[backcolor=transparent].[backcolor=transparent]equals[backcolor=transparent]([backcolor=transparent]response[backcolor=transparent].[backcolor=transparent]getErrorCode[backcolor=transparent]()))[backcolor=transparent] [backcolor=transparent]{
  40. [backcolor=transparent]            logger[backcolor=transparent].[backcolor=transparent]warn[backcolor=transparent]([backcolor=transparent]"上报事件错误:msg: {}, rid: {}"[backcolor=transparent],[backcolor=transparent] response[backcolor=transparent].[backcolor=transparent]getErrorMsg[backcolor=transparent](),[backcolor=transparent] response[backcolor=transparent].[backcolor=transparent]getRequestId[backcolor=transparent]());
  41. [backcolor=transparent]        [backcolor=transparent]}
  42. [backcolor=transparent]    [backcolor=transparent]}[backcolor=transparent] [backcolor=transparent]catch[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]Exception[backcolor=transparent] e1[backcolor=transparent])[backcolor=transparent] [backcolor=transparent]{
  43. [backcolor=transparent]        logger[backcolor=transparent].[backcolor=transparent]error[backcolor=transparent]([backcolor=transparent]"上报事件异常"[backcolor=transparent],[backcolor=transparent] e1[backcolor=transparent]);
  44. [backcolor=transparent]    [backcolor=transparent]}
  45. [backcolor=transparent]}
^AN9m]P  
Step4 事件上报Demo
,3K?=e2  
Demo1:http controller的异常监控
R?(j#bk  
主要目的是监控http请求是否有大量异常,如果每分钟异常次数超过一定数量就报警。实现原理是通过spring的拦截器或者servlet filter等技术对HTTP请求拦截,如果出现异常就记录日志,最后通过配置报警规则来达到报警的目的。 `^/Q"zH  
上报事件的demo如下:
  1. [backcolor=transparent]// 每个事件应该有丰富的信息来帮助我们搜索和定位问题,这里使用的map来组织事件, 最后转成Json格式作为事件的content。
  2. [backcolor=transparent]Map[backcolor=transparent]<[backcolor=transparent]String[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent]>[backcolor=transparent] eventContent [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]HashMap[backcolor=transparent]<[backcolor=transparent]String[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent]>();
  3. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"method"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]"GET"[backcolor=transparent]);[backcolor=transparent]  [backcolor=transparent]// http 请求方法
  4. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"path"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]"/users"[backcolor=transparent]);[backcolor=transparent] [backcolor=transparent]// http path
  5. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"exception"[backcolor=transparent],[backcolor=transparent] e[backcolor=transparent].[backcolor=transparent]getClass[backcolor=transparent]().[backcolor=transparent]getName[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]//异常类名,方便搜索
  6. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"error"[backcolor=transparent],[backcolor=transparent] e[backcolor=transparent].[backcolor=transparent]getMessage[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]// 异常报错信息
  7. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"stack_trace"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]ExceptionUtils[backcolor=transparent].[backcolor=transparent]getStackTrace[backcolor=transparent]([backcolor=transparent]e[backcolor=transparent]));[backcolor=transparent] [backcolor=transparent]// 异常堆栈,方便定位问题
  8. [backcolor=transparent]// 最后使用前面封装好的异步上报方法提交事件,这里是异步上报,并且没有重试,可能会小概率丢事件,但是已经能很好的满足http未知异常报警这个场景了。
  9. [backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"http_error"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]JsonUtils[backcolor=transparent].[backcolor=transparent]toJson[backcolor=transparent]([backcolor=transparent]eventContent[backcolor=transparent]));
  10. [backcolor=transparent]![[backcolor=transparent]image[backcolor=transparent].[backcolor=transparent]png[backcolor=transparent]]([backcolor=transparent]http[backcolor=transparent]:[backcolor=transparent]//ata2-img.cn-hangzhou.img-pub.aliyun-inc.com/864cf095977cf61bd340dd1461a0247c.png)
bF.Aj8ZQ  
Demo2:后台定时任务执行情况的监控与消息消费情况的监控
C,E 5/XW  
同上面的http事件,有很多类似的业务场景需要报警,例如后台任务与消息队列消费等,都可以通过类似的方式上报事件达到监控的目的。当异常发生时,第一时间收到报警。
  1. [backcolor=transparent]//消息队列的事件组织:
  2. [backcolor=transparent]Map[backcolor=transparent]<[backcolor=transparent]String[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent]>[backcolor=transparent] eventContent [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]new[backcolor=transparent] [backcolor=transparent]HashMap[backcolor=transparent]<[backcolor=transparent]String[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent]>();
  3. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"cid"[backcolor=transparent],[backcolor=transparent] consumerId[backcolor=transparent]);[backcolor=transparent]  [backcolor=transparent]// 代表消费者的身份
  4. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"mid"[backcolor=transparent],[backcolor=transparent] msg[backcolor=transparent].[backcolor=transparent]getMsgId[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]// 消息的id
  5. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"topic"[backcolor=transparent],[backcolor=transparent] msg[backcolor=transparent].[backcolor=transparent]getTopic[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]// 消息的主题,
  6. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"body"[backcolor=transparent],[backcolor=transparent] body[backcolor=transparent]);[backcolor=transparent] [backcolor=transparent]// 消息的主体
  7. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"reconsume_times"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]String[backcolor=transparent].[backcolor=transparent]valueOf[backcolor=transparent]([backcolor=transparent]msg[backcolor=transparent].[backcolor=transparent]getReconsumeTimes[backcolor=transparent]()));[backcolor=transparent] [backcolor=transparent]// 消息失败重试的次数
  8. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"exception"[backcolor=transparent],[backcolor=transparent] e[backcolor=transparent].[backcolor=transparent]getClass[backcolor=transparent]().[backcolor=transparent]getName[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]// 发生异常时的异常类名
  9. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"error"[backcolor=transparent],[backcolor=transparent] e[backcolor=transparent].[backcolor=transparent]getMessage[backcolor=transparent]());[backcolor=transparent] [backcolor=transparent]// 异常信息
  10. [backcolor=transparent]eventContent[backcolor=transparent].[backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"stack_trace"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]ExceptionUtils[backcolor=transparent].[backcolor=transparent]getStackTrace[backcolor=transparent]([backcolor=transparent]e[backcolor=transparent]));[backcolor=transparent] [backcolor=transparent]// 异常堆栈
  11. [backcolor=transparent]// 最后上报事件
  12. [backcolor=transparent]put[backcolor=transparent]([backcolor=transparent]"metaq_error"[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]JsonUtils[backcolor=transparent].[backcolor=transparent]toJson[backcolor=transparent]([backcolor=transparent]eventContent[backcolor=transparent]));
9M6&+1XE  
上报后查看事件: B0:O]Ax6.^  
对队列消息消费异常设置报警:
eNNK;xXe#  
Demo 3:记录重要事件
GM5::M]fS  
事件还有一种使用场景是用来记录一些重要的业务发生,但是不需要报警,方便日后翻看。 例如重要业务的操作日志,改密码,修改订单,异地登录等。 ^%nAx| 4xQ  
>\e11OU0Gy  
[ 此帖被反向一觉在2017-10-31 10:44重新编辑 ]
发表主题 回复主题
« 返回列表上一主题下一主题

限100 字节
如果您在写长篇帖子又不马上发表,建议存为草稿
 
验证问题: ECS是阿里云提供的什么服务? 正确答案:云服务器
上一个 下一个
      ×
      全新阿里云开发者社区, 去探索开发者的新世界吧!
      一站式的体验,更多的精彩!
      通过下面领域大门,一起探索新的技术世界吧~ (点击图标进入)