十年網(wǎng)站開(kāi)發(fā)經(jīng)驗(yàn) + 多家企業(yè)客戶 + 靠譜的建站團(tuán)隊(duì)
量身定制 + 運(yùn)營(yíng)維護(hù)+專業(yè)推廣+無(wú)憂售后,網(wǎng)站問(wèn)題一站解決
問(wèn)題
目前創(chuàng)新互聯(lián)已為1000+的企業(yè)提供了網(wǎng)站建設(shè)、域名、虛擬空間、網(wǎng)站托管維護(hù)、企業(yè)網(wǎng)站設(shè)計(jì)、江蘇網(wǎng)站維護(hù)等服務(wù),公司將堅(jiān)持客戶導(dǎo)向、應(yīng)用為本的策略,正道將秉承"和諧、參與、激情"的文化,與客戶和合作伙伴齊心協(xié)力一起成長(zhǎng),共同發(fā)展。
1)程序報(bào)警
Execution Timeout Couldn't get a connection within the time limit
2)mongod日志
Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.122+0800 I NETWORK [initandlisten] connection accepted from 10.0.0.1:24321 #32 (32 connections now open) Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.136+0800 I ACCESS [conn32] Successfully authenticated as principal __system on local Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.349+0800 I - [rsSync] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.358+0800 I CONTROL [rsSync] Jun 11 21:48:35 mongod mongod: 0x132c032 0x12c9988 0x12b29f8 0x12b2aac 0x9da659 0xae692f 0x106e4dc 0x1066b1e 0x1066d69 0xfd64f5 0xaea0fe 0xaea621 0xebe304 0xf563ae 0xf57c78 0xf4d29b 0x1b5c330 0x7efd30830dc5 0x7efd3055f73d Jun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE ----- Jun 11 21:48:35 mongod mongod: {"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD30829000","o":"7DC5"},{"b":"7EFD30468000","o":"F773D","s":"clone"}],"processInfo":{ "MongoDBVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.6.2.el7.x86_64", "version" : "#1 SMP Thu Feb 23 03:04:39 UTC 2017", "machine" : "x86_64" }, "somap" : [ { Jun 11 21:48:35 mongod mongod: "elfType" : 2, "b" : "400000", "buildId" : "BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC" }, { "b" : "7FFE4D008000", "elfType" : 3, "buildId" : "7112DB1E073B211AB0CF8DE793F40F3FF996F5B4" }, { "b" : "7EFD31753000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7EFD31369000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7EFD31161000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7EFD30F5D000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7EFD30C5B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7EFD30A45000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7EFD30829000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7EFD30468000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7EFD319C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7EFD3021A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7EFD2FF33000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7EFD2FD2F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7EFD2FAFD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7EFD2F8E7000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7EFD2F6D8000", "path" : "/lib64 Jun 11 21:48:35 mongod mongod: /libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7EFD2F4D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7EFD2F2BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7EFD2F093000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7EFD2EE32000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }} Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132c032] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10logContextEPKc+0x138) [0x12c9988] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12b29f8] Jun 11 21:48:35 mongod mongod: mongod(+0xEB2AAC) [0x12b2aac] Jun 11 21:48:35 mongod mongod: mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da659] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF) [0xae692f] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C) [0x106e4dc] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E) [0x1066b1e] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9) [0x1066d69] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5) [0xfd64f5] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E) [0xaea0fe] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1) [0xaea621] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144) [0xebe304] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E) [0xf563ae] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08) [0xf57c78] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB) [0xf4d29b] Jun 11 21:48:35 mongod mongod: mongod(execute_native_thread_routine+0x20) [0x1b5c330] Jun 11 21:48:35 mongod mongod: libpthread.so.0(+0x7DC5) [0x7efd30830dc5] Jun 11 21:48:35 mongod mongod: libc.so.6(clone+0x6D) [0x7efd3055f73d] Jun 11 21:48:35 mongod mongod: ----- END BACKTRACE ----- Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.389+0800 F - [rsSync] terminate() called. An exception is active; attempting to gather more information Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.393+0800 F - [rsSync] DBException::toString(): 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO Jun 11 21:48:35 mongod mongod: Actual exception type: mongo::MsgAssertionException Jun 11 21:48:35 mongod mongod: 0x132c032 0x132bb82 0x1b14356 0x1b14383 0xf4d3e0 0x1b5c330 0x7efd30830dc5 0x7efd3055f73d Jun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----
一、集群環(huán)境
3個(gè)mongos、3個(gè)mongo config、3個(gè)shard(每個(gè)分片各三臺(tái),一主兩從)
二、事故描述
次數(shù)1:某次活動(dòng),由于并發(fā)太多,導(dǎo)致mongod連接數(shù)太多,內(nèi)存消耗太多(每個(gè)連接上來(lái)會(huì)分配一定的內(nèi)存空間),導(dǎo)致整體query慢,產(chǎn)生大量堆積(應(yīng)用報(bào)超時(shí)等錯(cuò)誤)。直接重啟mongod,重啟前未stepDown,其中一個(gè)從節(jié)點(diǎn)2發(fā)現(xiàn)錯(cuò)誤。臨時(shí)剔除節(jié)點(diǎn),打算后續(xù)有時(shí)間解決(原因未知)
次數(shù)2:此時(shí)距離上次事故6天,節(jié)點(diǎn)2還未修復(fù)。同一個(gè)分片主節(jié)點(diǎn)1意外出現(xiàn)同樣的錯(cuò)誤(原因未知)此時(shí)只有一個(gè)主節(jié)點(diǎn),熬夜緊急修復(fù)這兩個(gè)節(jié)點(diǎn)
次數(shù)3:距離第二次事故兩天,同一個(gè)分片主節(jié)點(diǎn)3又出現(xiàn)了同樣的錯(cuò)誤(原因未知)此時(shí)一陣?yán)浜梗粋€(gè)分片輪番壞了一遍,辛虧解決的及時(shí)。
三、解決步驟
1、查看報(bào)錯(cuò)信息,感覺(jué)像是數(shù)據(jù)塊損壞問(wèn)題。并且此分片不能做更新操作,但是可以正常查詢。
2、問(wèn)題定位,oplog.rs損壞
> db > local db.oplog.rs.find() error: { "$err" : "BSONObj size: 1852142352 (0x1073656E) is invalid. Size must be between 0 and 16793600(16MB) First element: Status: ?type=100", "code" : 10334 }
3、重建oplog
1)找到最近的一條oplog記錄
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() {"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
2)保存記錄
> db.temp.save(db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() ) //確認(rèn)操作,很重要 >db.temp.find()
3)刪除oplog.rs,物理文件不會(huì)刪除
> db.oplog.rs.drop()
4)建立新的oplog
> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } ) { "ok" : 0, "errmsg" : "not authorized on local to execute command { create: \"oplog.rs\", capped: true, size: 53687091200.0 }", "code" : 13 } //很悲催,重建capped類型的表,顯示沒(méi)權(quán)限,而oplog.rs必須要是capped類型,可是已經(jīng)是root最大權(quán)限
將shardsvr角色更改為configsvr角色,去掉keyfile認(rèn)證(相當(dāng)于去掉權(quán)限),重啟服務(wù),再次執(zhí)行
configsvr> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } ); { "ok" : 1 }
為什么要更改角色呢?
--因?yàn)閟hardsvr角色,去掉keyfile認(rèn)證,服務(wù)啟動(dòng)不了。哭~~~
5)將上次操作的最近一條記錄寫到oplog中
> db.oplog.rs.save( db.temp.findOne() )
6)確認(rèn)操作
> db.oplog.rs.find() {"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
7)修改配置文件,更改會(huì)原來(lái)的配置,重啟服務(wù)。重新加入到集群中。查看延遲狀態(tài)
rs:SECONDARY> db.printSlaveReplicationInfo(); source: mongod122:10000 syncedTo: Mon Jun 12 2017 03:11:41 GMT+0800 (CST) 0 secs (0 hrs) behind the primary source: mongod121:10000 syncedTo: Sun Jun 11 2017 23:52:12 GMT+0800 (CST) 11969 secs (3.32 hrs) behind the primary
待延時(shí)追上后,一切歸于正常。
至今,還不知道是什么原因?qū)е碌脑摲制l繁發(fā)生此類錯(cuò)誤,后續(xù)如有進(jìn)展,再更新。