PDA

View Full Version : Cleaning up spikes in data



krisztian
January 18 2011, 04:31 PM
I'm wondering if there is a tool that we can use to clean up spikes in the historical data. From time to time we get tests where for whatever reason a giant spike occurs (most likely polling errors or counter rollovers or even just simply an error in the application). I know that the clean way to deal with them is to set a proper 'maxvalue' for the particular test, but what to do with the ones that already occurred? They make the long term history page useless for these tests....
Thanks,
Krisztian

rajib
January 19 2011, 03:18 PM
Hi Krisztian,



The product team has discussed the possibility of creating a tool to edit historical data, but you can manually alter the results in the database with minimal effort.
Identify the serial number for the test in question, either from the URL (testId=...) when drilling down into the test details page, or via the BVE API using "test.list" command (serialNumber=...)
Connect to the DGE database using:


cd /usr/local/traverse
etc/dgedb.init admin dge
Locate the time-slots currently in use for this test in the DGE database tables:



mysql> select id from AggregationInfo where testSerialNumber=nnnnn;
where "nnnnn" is the test serial number you identified in step (a). This should produce output with 4 numbers:



+------+
| id |
+------+
| aaaa | --> represents data in 24-hr graph
| bbbb | --> represents data in weekly graph
| cccc | --> represents data in monthly graph
| dddd | --> represents data in yearly graph
+------+
Make sure that the data point you are trying to reset can be located:



mysql> select * from AggregationDataScheme0 where
aggregationInfoId=zzzz and maximumValue > 200000\G
In this case the offending data point has a value larger than 200000. If you are trying to fix data in the weekly graph, you would set zzzz = bbbb and use AggregationDataScheme1 for table name. The output should help you ensure that the query is going to match data that you are interested in and not valid results:



*************************** 1. row ***************************
aggregationInfoId: zzzz
timeSlot: qqqq
numSamples: 1
average: 200001
minimumValue: 200001
maximumValue: 200001

Instead of "maximumValue", you can also create the query based on "average" or "minimumValue". Again, make sure that the rows matched have the data you are correcting. Once the update is applied, it cannot be changed back

Now you can update the polled result with 0, or -5 (which will be treated as NO DATA when displaying the graphs):



mysql> update AggregationDataScheme0 set numSamples=0,
average=-5, minimumValue=-5, maximumValue=-5, maxSeverity=8,
lastTimestamp=0 where maximumValue > 200000 and aggregationInfoId=zzzzz;
You can repeat steps 4 and 5 for each aggregation period (weekly, monthly, etc). Remember to use AggregationDataScheme0 for daily, AggregationDataScheme1 for weekly, AggregationDataScheme2 for monthly and AggregationDataScheme3 for yearly graph data.
The graphs should reflect the change immediately (make sure to clear your browser cache). Please let me know if you encounter any difficulty.

- Rajib

mgh4
January 24 2011, 01:36 PM
Rajib,

I've attached fixdata.zip. A perl script I use to clean up data.

You might want to clean up what I've done, especially where I've called other scripts of my own, but it's something for others to work from. I've used it for years.

This problem occurs A LOT, and doing the steps you gave is just way to tedious and prone to typing mistakes.

Mike

rajib
January 24 2011, 06:09 PM
Thanks a lot for sharing the tool Mike. If you are noticing large number of invalid/high values, our support team would be happy to analyze the data, in case the device(s) in question need special handling.

- Rajib

estaszko
May 18 2011, 07:18 AM
We have found that this symptom almost always happens with a device restart.
Is there a way to ignore the first data point, after having an uptime reset event?