Calmail Update: Wednesday November 23, 2011
This is a status update on CalMail for the campus technical community, covering the recent issues with email and their root cause problems, along with both completed and planned future corrective actions by the extended Calmail team*. I plan to host a conference call open to the campus community to brief anyone interested in the content shared below, and also to take any questions from the community, on November 29th at 12pm. For call-in information, please register at: http://goo.gl/Jkj1r
The root cause problem impacting campus email is storage equipment performance. The storage area network (SAN) that serves CalMail is nearing the end of its lifecycle and running too close to maximum disk I/O capacity, and its slowness directly impacts CalMail performance. Almost every other issue that has been reported over the past few weeks derives from this underlying root cause (including the email list and folder problems, which were fallout from the 10/25 outage).
The storage team has an expedited order submitted for a new, much faster unit which will arrive on campus on December 2. Moving the large volume of email from the old unit is a major undertaking, as the team must balance speed of data transfer with the fact that the transfer itself can strain limited disk I/O. We expect the transfer to take significant time, running mainly at night. Once complete, the extended CalMail team will schedule cutover to the new storage unit. We anticipate the migration and cutover process to run from December 19 through January 9th.
In the meantime, the CalMail team has taken a number of steps to mitigate the impact of the overloaded storage system, including automated monitoring and some systems automation to tune the handling of mail queues on the fly with the intent of avoiding major downtime. Until the storage is replaced in late December there will be brief hiccups that will affect people checking email. We will do everything we can to keep the system from an outage of the magnitude experienced on 10/25.
There have been two other problems unrelated to storage, both of which impacted but were not caused by CalMail. First an authentication problem prevented some web client and AppleMail users from logging in, and second, an anti-phishing false positive caused some legitimate emails to be rejected. The authentication problem occurred after a migration to a new firewall infrastructure for the CalNet authentication servers in the data center on 10/26. This migration modified the way previously unknown unusual behaviors in the communications between the CalMail web hosts and CalNet systems were being handled by the firewalls, causing connectivity problems between these systems. The issue was identified and mitigated on 10/31. The rejection of legitimate email as phishing attempts problem was caused by human error when on 11/17 the support staff saw an http://www.w3.org/ URL in a phishing message and mistook it for a phishing message's target URL. This was identified and corrected on 11/22.
We have been aggressively recruiting for email system administration and now expect to have the first new person onboard Monday next week.
Director, Campus Technology Services
* The extended Calmail team includes the Calmail team plus the storage, unix, network, and database teams, plus calmail emeritus staff.
The following was automatically added to this message by the list server:
To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:
Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet. This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
|Free forum by Nabble||Edit this page|