[IGS-DCWG-71] Re: [IGS-DCWG-70] Re: [IGS-DCWG-68] Re: [IGS-DCWG-67] Re: [IGS-DCWG-65] Change in compression scheme

Giovanni Sella Giovanni.Sella at noaa.gov
Fri May 7 14:09:49 PDT 2010


******************************************************************************
IGS-DCWG Mail      07 May 14:10:01 PDT 2010      Message Number 71
******************************************************************************

Author: Giovanni Sella

  Dear Colleagues,
	With much delay I'm going to add my comments on this proposed change. I 
have talked to the NOAA archive group that supports us, the National 
Geophysical Data Center, and got a very strong recommendation for gz 
over bzip2. They actually only pull our gz copies and not our Z copies 
of our files. This is the basic logic:

gzip is used by most archive centers as it is much faster than bzip2

It supports archiving multiple files like tar, bzip2 does not.

It does not compress as much as bzip2 but the tradeoff in faster speed 
makes it greatly preferred.  This is a very important consideration 
given the ever increasing number of stations. AC would pay a very high 
price in the uncompressing of the files with bzip2.

gzip has been used for many years and is tested on numerous platforms 
and understood, bzip2 is a new beast.

gzip has the option to increase the compression rate from the default 6 
to 9 with no extra cpu overhead when checksums are run.

NGDC has been after me for some time to add MD5 checksums as a more 
robust checksum than those in gzip. Just this week I rolled out this new 
set of files 1 for every file they archive. I'm doing it forward in time 
for the moment and then will go back and do the older data as well. The 
very small MD5 checksum could easily be exchanged between data centers 
to ensure that the same data holdings exist without having to query the 
gzip file itself.

I personally am a very strong proponent of abandoning UNIX compress .Z 
and switching to .gz and have discussed this a couple of times in the 
past with Carey and Angie, but the re-processing was always looming and 
that was daunting enough without having to mix in a change in compression.

  My two cents. Have a nice weekend.
  Giovanni



Carey Noll wrote:
> ******************************************************************************
> IGS-DCWG Mail      01 Apr 12:42:21 PDT 2010      Message Number 70
> ******************************************************************************
> 
> Author: Carey Noll
> 
>    bzip2 will provide better compression according to my system
> administrator. He is currently running some tests of various
> compression utilities; I will send a summary report to the DCWG
> when he has finished his tests.
> 
>    I would recommend doing these changes in steps. Let's tackle
> this proposed compression change first before we start thinking
> about more "drastic" changes such as in filenaming, etc. That is
> my preference/recommendation anyways!
> 
>    I also would like to encourage more feedback on Mike Schmidt's
> earlier email proposal for OCs to push data to all global data
> centers. It sounds to me like an "easy" implementation to help
> keep data current and available at all GDCs.
> 
> Thanks,
> Carey.
> -----
> On Apr 1, 2010, at 11:42 AM, Nacho Romero wrote:

-- 
Giovanni Sella, Ph.D.
CORS Program Manager,
National Geodetic Survey                       For EXISTING CORS e-mail: 

NOAA-NOS, SSMC3-8716, 1315 East-West Hwy.        ngs.corscollector @ 
noaa.gov
Silver Spring, MD 20910, USA                   For PROPOSED CORS e-mail:
Tel 001-301-713-3198x126, Fax 001-301-713-4324   ngs.proposed.cors @ 
noaa.gov

CORS Guidelines: www.ngs.noaa.gov/CORS/Establish_Operate_CORS.html



More information about the IGS-DCWG mailing list