piyushm
May 18 2009, 04:37 PM
In the event the DGE database encounters an error (see example log entries below from error.log) due to improper shutdown/power failure, running out of disk space, or improper access by an external applications, you may need to repair the database before new performance data/events are recorded:
Examples:
2009-05-18 11:17:30,438 message.RemoveExpiredMessagesFromLiveDb
[ThreadPool[PooledCommandRunner$PooledCommandRunnerHelper]]:
(ERROR) Failed to execute command: Table '.\liveeventsdb\duplicatemessageinfo'
is marked as crashed and should be repaired
2009-09-21 12:27:09,109 monitor.AggregatedDataDbWriter[aggregation-writer-Manager]:
(ERROR) Error writing objects: Incorrect key file for table
'.\aggregateddatadb\aggregationdatascheme2.MYI'; try to repair it
On Windows platform, anti-virus software (McAfee, Symantec, etc) can cause similar corruption through "on-access scan" of the database tables/files while information is being written to them. If anti-virus software is installed on the Traverse server, they should be configured to exclude the <TRAVERSE_HOME>/database directories. Similarly, scheduled backup tasks should also be configured to skip the database directory.
If the DGE is unable to write to it's local database, you may notice missing performance data when you drill-down into a test being monitored from that DGE. To repair the DGE database, follow these steps:
Stop the Traverse components using Start -> All Programs -> Zyrion Traverse -> Stop Zyrion Traverse on Windows or "etc/traverse.init stop" on Linux/Solaris.
Backup the log files in the <TRAVERSE_HOME>/logs directory to another folder and delete the log files in the <TRAVERSE_HOME>/logs directory (optional).
Perform database repair using Start -> All Programs -> Zyrion Traverse -> Database Management -> DGE Database Repair on Windows. On Linux/Solaris run "utils/db_repair.sh".
Start Traverse components.
Once the components are up, please check database.log and error.log files to ensure that the errors are no longer being recorded.
Examples:
2009-05-18 11:17:30,438 message.RemoveExpiredMessagesFromLiveDb
[ThreadPool[PooledCommandRunner$PooledCommandRunnerHelper]]:
(ERROR) Failed to execute command: Table '.\liveeventsdb\duplicatemessageinfo'
is marked as crashed and should be repaired
2009-09-21 12:27:09,109 monitor.AggregatedDataDbWriter[aggregation-writer-Manager]:
(ERROR) Error writing objects: Incorrect key file for table
'.\aggregateddatadb\aggregationdatascheme2.MYI'; try to repair it
On Windows platform, anti-virus software (McAfee, Symantec, etc) can cause similar corruption through "on-access scan" of the database tables/files while information is being written to them. If anti-virus software is installed on the Traverse server, they should be configured to exclude the <TRAVERSE_HOME>/database directories. Similarly, scheduled backup tasks should also be configured to skip the database directory.
If the DGE is unable to write to it's local database, you may notice missing performance data when you drill-down into a test being monitored from that DGE. To repair the DGE database, follow these steps:
Stop the Traverse components using Start -> All Programs -> Zyrion Traverse -> Stop Zyrion Traverse on Windows or "etc/traverse.init stop" on Linux/Solaris.
Backup the log files in the <TRAVERSE_HOME>/logs directory to another folder and delete the log files in the <TRAVERSE_HOME>/logs directory (optional).
Perform database repair using Start -> All Programs -> Zyrion Traverse -> Database Management -> DGE Database Repair on Windows. On Linux/Solaris run "utils/db_repair.sh".
Start Traverse components.
Once the components are up, please check database.log and error.log files to ensure that the errors are no longer being recorded.