Dallas Connectivity Issues

Jul 30, 2010 0:39

Some individuals are reporting connectivity issues to the Dallas network. This does not appear to be a total outage, but affecting a large number of routes.

Network engineers onsite are investigating the issue now.

7/30/2010 12:45AM EDT — Things appear to be stabilizing and returning online now.

Edison — Urgent Kernel Upgrade

Jul 16, 2010 10:27

Hello,

We are seeing anomalies in the kernel version running on the Edison hardware node. We believe it is in everyone’s best interest to go ahead and reboot this node into the latest version of the OpenVZ kernel.

We do apologize for any inconvenience this may cause… The server will return online and all VPS started within 30 minutes.

Thank you,
Server Complete, LLC

ATL-NAS1 Maintenance

Jul 7, 2010 11:41

We will be performing maintenance on the ATL-NAS1 (66.71.253.66) backup server over the next 24 hours. This NAS is getting seamlessly migrated to new, more redundant hardware.

There will be a brief period of downtime as the maintenance finalizes. This should not exceed roughly 1-2 hours.

UPDATE – 1:13PM EDT (7/8/2010)
All maintenance on ATL-NAS1 has been completed. Everything has been seamlessly migrated to a new, more redundant data store. No other changes were made and all data remained in tact.

Edison Troubles

Jun 28, 2010 7:24

Hi everyone,

As I am sure that those affected are aware, there have been two unacceptable downtimes, one today, one yesterday, on the edison hardware node.

This issue is particularly interesting in that our monitoring systems are not picking it up due to the nature of the load. The machine is going to 100% io wait, but the idle % of the CPU is at 99%, meaning the cpu itself is sitting idle, but the hard disks are being hit, and hit very hard.

This is basically what it looks like when logging into the box:

07:19:48 up 23:27,  1 user,  load average: 994.64, 984.78, 956.41

We’ve identified the issue to be with our R1Soft CDP backup platform. The R1 backup kernel module is spiraling out of control, causing the IO on the machine to become unstable. The only way to kill this off is to reboot the machine.

At this time, we are awaiting someone in the facility to reboot the machine. Once they do so, it will return online and everything will be back to normal.

The R1Soft CDP backups on this node are scheduled to start at 4:00AM EDT every day. We have disabled the backups on this node and will be contacting R1Soft later today to have them perform a formal investigation and rectify the issue.

The node and all VPS will return online as soon as possible.

Thanks,
Daniel Stephens
Server Complete, LLC

Tarzan [ Xen ] Urgent Reboot

Jun 23, 2010 15:54

The Tarzan Xen node out of San Jose is going down for an emergency reboot to apply a critical kernel patch to the host node.

We apologize for any inconvenience this may cause… Normally we would schedule maintenance such as this, but it is a critical security patch.

Thanks,
Server Complete, LLC

UPDATE — 4:17PM EDT
The node is back online and VMs are presently starting up.

UPDATE — 4:29PM EDT
All VMs should be back up within the next 5-10 minutes as they complete their boot process. Degraded performance for about 30 minutes will be normal while everything stabilizes.

Edison Down

Jun 6, 2010 9:41

Hello,

Our monitoring systems have just alerted us of an issue with the Edison hardware node. This is being checked into.

Thanks

9:42AM EDT — The issue has been resolved. The datacenter identified a problem with a distribution switch that required an upgrade to the Cisco IOS. This upgraded required the switch to be rebooted. This caused a brief (5 minute) interruption of service. Everything is fine now and back to normal.

Maintenance Started

May 27, 2010 22:00

The previously announced maintenance window has begun on the Cyprus hardware node.

12:34AM EDT — The maintenance window on Edison has begun and is in progress.

1:10AM EDT — Edison is currently undergoing a chassis swap to change it’s form factor from a mid tower to a 1U chassis. The hardware within will be remaining the same, however, the case switch from a mid-tower to 1U is required. It will return online as soon as possible, hopefully within the outlined maintenance window.

2:00AM EDT — The rack containing the Edison hardware node is still being worked on by datacenter technicians. No ETA is available at this time, but the node will return online ASAP.

2:32AM EDT — The switch in the rack containing the Edison hardware node has failed. A replacement switch was put into place and VLAN configs restored from backups. Edison is currently undergoing an FSCK and will return online as soon as it is finished.

254AM EDT — The node is back online and all VPS are either booted or in the process of booting. It appears all but 2 have already booted. The remaining will be back online momentarily.

Thank you for your patience everyone! This maintenance window is now closed.

Upcoming Maintenance [ Atlanta -- Some Nodes ]

May 24, 2010 23:21

Hi everyone,

We have scheduled a planned maintenance window on some Atlanta hardware nodes later this week. The affected nodes are Edison (Linux VPS) and Cyprus (Windows VPS). This maintenance is to increase the power and network redundancy on a segment of the network.

This maintenance also encompasses a couple dedicated server clientele — Those affected have been personally contacted with an individual time frame.

The maintenance windows are as follows:

Edison – Friday, May 28, 2010 // 00:00 – 2:00AM EDT
Cyprus – Thursday, May 27, 2010 // 22:00 – 23:59PM EDT

All customers affected should have already received a maintenance notice via email.

Updates during these maintenance periods will be posted right here on the status blog.

Thanks,
Server Complete, LLC

SC Site Down

May 18, 2010 12:53

Hello everyone,

The SC site as well as Mission Control have been down for roughly 30 minutes. The datacenter where we host our internal system are experiencing network difficulties and it will return online ASAP.

No customer equipment is affected.

If you have any questions, please feel free to shoot me an email — daniel @ servercomplete . com

Thanks,
Daniel Stephens
Server Complete, LLC

DALLAS NETWORK MAINTENANCE

May 7, 2010 19:22

This is a reminder of the Dallas Network Maintenance this evening starting at 11:00PM EDT. The allotted time frame for this maintenance is 5 hours, however, it should not take longer than an hour.

We will be performing kernel upgrades on all Dallas machines as well as increasing the overall network capacity to all racks.

If you have any questions, please open a ticket from Mission Control.

Thanks,
Server Complete, LLC

UPDATE – 10:37PM EDT
We have finished running a manual R1Soft backup of all of our hardware nodes in Dallas to our offsite CDP server. Maintenance will continue as planned promptly at 11:00PM EDT.


UPDATE – 10:55PM EDT
Pre-maintenance checks have been completed and everything is in order. Containers will begin shutting down here in roughly 3 minutes in preparation for the maintenance. Further updates will continue to be posted here.


UPDATE – 11:01PM EDT
The shutdown sequence has been initiated across all nodes and containers are cleanly shutting down at this time. At this time, roughly half of the Dallas network is in a shutdown state.


UPDATE – 11:06PM EDT
All containers have been safely placed in an off power state and nodes are beginning their shutdown sequence now. This will be the last update until stuff begins returning online.


UPDATE – 11:24PM EDT
Containers are beginning to boot on some nodes now. Further nodes will be back online momentarily.


UPDATE – 11:33PM EDT
The Titanium hardware node is running a File System Check (FSCK) and will return online as soon as it is completed. The majority of everything else is either already back online or in the process of booting at this time. A couple nodes are still being worked on.


UPDATE – 11:40PM EDT
The remaining two nodes being worked on are Titanium and Otto. Titanium is still undergoing a File System Check (FSCK) and will return online as soon as it is complete. Otto will be returning online momentarily.


UPDATE – 11:51PM EDT
Otto has returned online and all VPS on it should be either already online or online within the next 5-10 minutes. Titanium is at 85% on the FSCK and will return online soon. Everyone else should be back online and good to go!

UPDATE – 11:58PM EDT
All nodes are back online and all VPS have been booted! Thank you everyone for your understanding and patience! If your VPS is unaccessible, PLEASE open a ticket ASAP so that we may check into it for you!