Go Back   Australian Ford Forums > General Topics > Ford Forums Central > Site Support

Site Support If something isn't working or you have a suggestion ( a nice one !! ) let us know here.

 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old 26-06-2010, 01:12 PM   #1
russellw
Chairman & Administrator
Donating Member3
 
russellw's Avatar
 
Join Date: Dec 2004
Location: 1975
Posts: 106,650
Community Builder: In recognition of those who have helped build the AFF community. - Issue reason: Raptor: For Continued, and prolonged service to the wider Ford Community 
Default Outage

Good evening

Happily, the forums are now back active again after a 60 hour outage.

I expect that things will be a little slow for awhile as there are still a number of replication and copying tasks going on in the background so a little more patience is required.

Let me start by offering my thanks to Chris (cs123) for the effort and skill he used in recovering what looked like an almost impossible recovery task. He's not had much sleep either since Friday evening and without his assistance we would probably not have succeeded.


For those who are technically minded, here is the sequence of events:
  • Thursday morning at about 0600, one of the disks in the server array failed. It had probably been going bad for some time which was probably the cause of our recent random issues. On the plus side we tuned a lot of other things that needed doing but it was still painful.
  • The failure of one disk should not have been an issue - we maintain a RAID array specifically to avoid issues of this nature but (as usual) Murphy's law applied and the second drive hadn't been mirroring since the relocation on the previous weekend. We were advised by the techs that this meant no data was recoverable.
  • Even that shouldn't have mattered as a full system backup was supposed to have been taken before the server relocation but Murphy was active again and they didn't do it - the reasons that I was given are basically beyond belief but that was that.
  • We also take an off-site backup every month and this was the next recovery option. These are large files and the time taken to upload them is consequently in the order of 25 hours so it was decompressed here and uploaded in parts which only took 9 hours! After uploading the file and running a restore, we found that only a third of the database tables were available thanks to a CRC error in the archive.
  • We were in the process of copying up the original archive (another 25 hours) to run some repair tools on it so that the remaining tables could be recovered when Chris found a way of recovering the data on the original drives.
  • It has still taken some hours since then to copy, move and test the functionality of the server but at this stage we are happy that it should be reasonably stable.
We will obviously be using this exercise to learn what we can do better to avoid a catastrophic failure such as this - or at least have more recovery options open to us when it does happen given that it has been only 12 months (almost exactly) since the last RAID failure.

Thanks to everyone for their patience. To GT and Melz, thanks for keeping the Facebook page up to date; to wulos, Auslandau and Falcon Coupe, thanks for the moral support and to all those who offered assistance, thanks.

There are still some issues to be resolved with mail services and a couple of other items but we will work through these as we go.

Regards
Russ

__________________

__________________________________________________

Observatio Facta Rotae


russellw is offline   Reply With Quote Multi-Quote with this Post
 


Forum Jump


All times are GMT +11. The time now is 03:41 AM.


Powered by vBulletin® Version 3.8.5
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Other than what is legally copyrighted by the respective owners, this site is copyright www.fordforums.com.au
Positive SSL