[IGSMAIL-6878] Outages and delays impacting IGS Ultra-rapid and Rapid products over the weekend
Jake Griffiths - NOAA Federal
jake.griffiths at noaa.gov
Mon Mar 10 11:18:23 PDT 2014
Author: Analysis Center Coordinator (ACC)
Dear IGS Colleagues:
The computer servers performing the IGS combinations experienced a network
outage that lasted approximately 4.75 hours, starting sometime after 07:01
(EST) on Saturday, March 8, 2014, continuing until approximately 11:46 (EST)
the same day. The cause of the network outage remains unknown, but it seems to
have been associated with an event affecting our offsite facility hosting the
ACC computers.
The network outage impacted the Ultra-rapid and Rapid combinations in the
following ways:
* file retrieval processes were unable to get Analysis Center (AC) and other
critical products via the internet for the duration of the network outage
* the timing of the network outage spanned the duration of the Ultra-rapid
combination for GPS Wk 1782, day 6, hour 12, and this had several adverse
consequences:
+ the 1782_6_12 combination used only a single AC
+ it failed to complete properly
+ and without manual intervention, combination processes that fail to
complete properly prevent future combinations from occurring--thus,
all subsequent Ultra-rapid combinations failed through 1783_1_06
+ thus, there were no Ultra-rapid products delivered for 1782_6_18
through 1783_1_06
* the timing of the network outage also spanned the duration of the Rapid
combination for Wk 1782, day 5 and this had different adverse consequences:
+ the combination omitted products from four ACs, but had enough
to form combined orbits, clocks, and ERPs which were delivered on time
+ but the resulting combination was less rigorous than if the
full set of ACs were available, so significant timescale instabilities
were introduced
IGS Ultra-rapid products for 1782_6_12 through 1783_1_06 were restored and
resubmitted today by 10:21 AM (EDT). The subsequent Ultra-rapid combination for
1783_01_12 happened automatically without a problem. We expect the Ultra-rapid
products to continue normally.
IGS Rapid products for 1782_5 and 1782_6 were regenerated and resubmitted today
by 11:24 AM (EDT). This was done mainly to correct timescale instabilities that
resulted from having only three useable ACs in the original 1782_5 combination.
The Rapid combination for 1783_0 happened normally and automatically, and those
products were released about an hour ago. We expect the Rapid products to
continue normally.
It must be made clear that the IGS Ultra-rapid products contain 24h predictions
to help withstand short (probably less than 12h-long) glitches, like the one
caused by the network outage for 1782_6_12. However, the prolonged outage that
precipitated over the full weekend is a major failure and is unacceptable.
Then the question that remains is, why did the ACC not manually intervene on
Saturday AM to prevent the subsequent failures of Ultra-rapid combinations for
1782_6_18 through 1783_1_06? Those circumstances are beyond the ACC's control,
and do not include any issue(s) at the offsite facility hosting the ACC servers.
We are very sorry for the unfortunate events of this past weekend. We will do
everything within our ability to prevent such a prolonged outage from happening
again on our watch.
We are interested to know more about how the events of this past weekend
impacted your work. Please feel free to send any comments, suggestions, or
concerns to Kevin and me (igs.acc at noaa.gov).
Best regards,
Jake & Kevin (NOAA/NGS)
More information about the IGSMail
mailing list