Re: [Micronet] CalMail Status

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] CalMail Status

Christopher Brooks
Shel,
I appreciate all the efforts that you and the staff are going through to
bring Calmail back on line.

I take this as a wake up call for disaster recovery plans for my own cluster.

Under what circumstances do we fall back on a different set of Calmail
hardware?

Presumably there are disaster recovery plans that have us host our email
system elsewhere.

Are those plans publicly available for CalMail?  If the building that
contains Calmail is destroyed, what happens?  Is there some sort of
service level agreement and if a certain level of service is not met, then
we move our mail system to somewhere else (UC Davis?)?

At this point, it seems that moving Calmail to another location is not
necessary, but at what point would Calmail be moved?

These are all good questions and answers for a lessons learned document
that I hope is forthcoming.  I'd be happy with an email message to
Micronet that addresses these topics.

I briefly searched for information about Calmail's disaster recovery plans
but did not find anything.  It would be OE to determine what sort of
disaster recovery information should be made available and to have sites
like Calmail and the Hub provide this information.

I, for one, see this is an excellent moment to update my list of
alternative email addresses for my frequent campus contacts.

I'm also going to take this opportunity to check my back up system and
push forward on replacing some old hardware.

Many thanks again to everyone for all their efforts over the weekend.  I'm
sure some long, anguished hours have been put in.

_Christopher

Christopher Brooks, PMP                       University of California
CHESS Executive Director                      US Mail: 337 Cory Hall
Programmer/Analyst CHESS/Ptolemy/Trust        Berkeley, CA 94720-1774
ph: 510.643.9841                                (Office: 545Q Cory)
home: (F-Tu) 707.665.0131 cell: 707.332.0670

Shel writes:

> I very much appreciate that it is unacceptable to have a core campus
> system like CalMail offline for any length of time.  Today`s continued
> problems with CalMail have been particularly difficult for all involved.
> We work hard to design and operate systems that can handle the needs of
> our community and when we fail to meet that standard we bring in outside
> experts to help us improve.  We have added additional outside experts from
> other campuses, vendors and new team members from the Berkeley technical
> community as we work through this crisis. This is our highest priority and
> will remain so until we have the environment fully stabilized.  The
> following message provides current information.  You can also check
> http://systemstatus.berkeley.edu for the latest status.
>
> Regards,
> Shel Waggener
> Associate Vice Chancellor and CIO
>
> CURRENT STATUS SUMMARY
> 1) The CalMail system is available only through web clients at
> http://calmail.berkeley.edu. All messages are being sent and received but
> can only be accessed via webmail.  Webmail sessions may be slower than
> normal due to volume.
> 2) Students are strongly encouraged to forward their email to alternate
> email accounts. Instructions for how to do that can be found on the
> Calmail site (http://calmail.berkeley.edu, Manage Account option).
>
> DETAILS:
>
> CalMail has been substantially impacted during the last 36 hours after the
> successful recovery from database corruption this weekend.  The load on
> the system has remained extremely high as the millions of backlogged
> messages are delivered.  The load situation worsened considerably Monday
> morning as tens of thousands of campus community members returned from the
> holidays and connected for the first time, pushing the load above
> operating limits.  Normal processes that usually run in the background
> unnoticed, including copying data off of a failed hard drive, effectively
> shut down the system for many people.  Attempts to keep email moving
> during the day Monday were minimally successful.  Monday night during off
> hours work was undertaken to accelerate the repair of the failed disk and
> prepare for anticipated continued high load through this week.  On Tuesday
> morning, the load exceeded even the unusually high levels experienced on
> Monday and ultimately caused the entire system to become unusable.
> Unfortunately the root cause of this problem - insufficient capacity on
> the legacy CalMail environment - cannot be resolved safely without the new
> storage array that is not expected to be available until the weekend in
> spite of overnight delivery of key components and around the clock work of
> staff and vendors. We recognize how critical email is, this week in
> particular, to our ability to perform our work and have taken the
> following immediate actions to assist in lowering the load to allow email
> to continue to flow.
> 1)While faculty and staff email must remain on a university provisioned
> email service, students with external email accounts are encouraged to
> forward their CalMail messages there.  Doing so will lower both email
> volume and login attempts and thereby reduce load on the Calmail system.
> 2)Access to email from anything but the CalMail web clients (available at
> http:calmail.berkeley.edu) has been disabled. This dramatically reduces
> the number of simultaneous connections from cell phones, iPads, and
> clients such as Outlook and MacMail, which are often configured to
> maintain persistent connections and in doing so place extremely heavy load
> on the CalMail platform.
> 3)Moving some users to other campus email services temporarily to further
> reduce load.
>
> While these are drastic actions, none were undertaken lightly but were
> done after extensive consultation with technical experts from campus
> central and departmental staff as well as vendors we have enlisted to work
> on the problem.
> Once the new storage system is installed, configured and tested, we will
> begin the migration process from the legacy storage system to one more
> than double in size and expected to handle at least several times our
> current load.  That process itself will put substantial load on the legacy
> storage environment as data is copied so we are planning on doing this
> work off hours.
>
> We will continue to provide updates and information about the latest plans
> via Calmessages as well as posting the information here:
> http://ist.berkeley.edu/ciocalmailupdates
> We will also continue to provide regular updates via
> http://systemstatus.berkeley.edu
>
>



 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] CalMail Status

William
First, I'd like to say that all web based email sucks.  They're entirely
lacking in usability features that I'm accustomed to. /rant

Thankfully, it is a fall back.  I 2nd the kudos to the team working on the
system for figuring out a way to keep the mail system running despite the
heavy load.

P.S. Overnight delivery doesn't seem to mean what it used to.



> Shel,
> I appreciate all the efforts that you and the staff are going through to
> bring Calmail back on line.
>
> I take this as a wake up call for disaster recovery plans for my own
> cluster.
>
> Under what circumstances do we fall back on a different set of Calmail
> hardware?
>
> Presumably there are disaster recovery plans that have us host our email
> system elsewhere.
>
> Are those plans publicly available for CalMail?  If the building that
> contains Calmail is destroyed, what happens?  Is there some sort of
> service level agreement and if a certain level of service is not met, then
> we move our mail system to somewhere else (UC Davis?)?
>
> At this point, it seems that moving Calmail to another location is not
> necessary, but at what point would Calmail be moved?
>
> These are all good questions and answers for a lessons learned document
> that I hope is forthcoming.  I'd be happy with an email message to
> Micronet that addresses these topics.
>
> I briefly searched for information about Calmail's disaster recovery plans
> but did not find anything.  It would be OE to determine what sort of
> disaster recovery information should be made available and to have sites
> like Calmail and the Hub provide this information.
>
> I, for one, see this is an excellent moment to update my list of
> alternative email addresses for my frequent campus contacts.
>
> I'm also going to take this opportunity to check my back up system and
> push forward on replacing some old hardware.
>
> Many thanks again to everyone for all their efforts over the weekend.  I'm
> sure some long, anguished hours have been put in.
>
> _Christopher
>
> Christopher Brooks, PMP                       University of California
> CHESS Executive Director                      US Mail: 337 Cory Hall
> Programmer/Analyst CHESS/Ptolemy/Trust        Berkeley, CA 94720-1774
> ph: 510.643.9841                                (Office: 545Q Cory)
> home: (F-Tu) 707.665.0131 cell: 707.332.0670
>
> Shel writes:
>
>> I very much appreciate that it is unacceptable to have a core campus
>> system like CalMail offline for any length of time.  Today`s continued
>> problems with CalMail have been particularly difficult for all involved.
>> We work hard to design and operate systems that can handle the needs of
>> our community and when we fail to meet that standard we bring in outside
>> experts to help us improve.  We have added additional outside experts
>> from
>> other campuses, vendors and new team members from the Berkeley technical
>> community as we work through this crisis. This is our highest priority
>> and
>> will remain so until we have the environment fully stabilized.  The
>> following message provides current information.  You can also check
>> http://systemstatus.berkeley.edu for the latest status.
>>
>> Regards,
>> Shel Waggener
>> Associate Vice Chancellor and CIO
>>
>> CURRENT STATUS SUMMARY
>> 1) The CalMail system is available only through web clients at
>> http://calmail.berkeley.edu. All messages are being sent and received
>> but
>> can only be accessed via webmail.  Webmail sessions may be slower than
>> normal due to volume.
>> 2) Students are strongly encouraged to forward their email to alternate
>> email accounts. Instructions for how to do that can be found on the
>> Calmail site (http://calmail.berkeley.edu, Manage Account option).
>>
>> DETAILS:
>>
>> CalMail has been substantially impacted during the last 36 hours after
>> the
>> successful recovery from database corruption this weekend.  The load on
>> the system has remained extremely high as the millions of backlogged
>> messages are delivered.  The load situation worsened considerably Monday
>> morning as tens of thousands of campus community members returned from
>> the
>> holidays and connected for the first time, pushing the load above
>> operating limits.  Normal processes that usually run in the background
>> unnoticed, including copying data off of a failed hard drive,
>> effectively
>> shut down the system for many people.  Attempts to keep email moving
>> during the day Monday were minimally successful.  Monday night during
>> off
>> hours work was undertaken to accelerate the repair of the failed disk
>> and
>> prepare for anticipated continued high load through this week.  On
>> Tuesday
>> morning, the load exceeded even the unusually high levels experienced on
>> Monday and ultimately caused the entire system to become unusable.
>> Unfortunately the root cause of this problem - insufficient capacity on
>> the legacy CalMail environment - cannot be resolved safely without the
>> new
>> storage array that is not expected to be available until the weekend in
>> spite of overnight delivery of key components and around the clock work
>> of
>> staff and vendors. We recognize how critical email is, this week in
>> particular, to our ability to perform our work and have taken the
>> following immediate actions to assist in lowering the load to allow
>> email
>> to continue to flow.
>> 1)While faculty and staff email must remain on a university provisioned
>> email service, students with external email accounts are encouraged to
>> forward their CalMail messages there.  Doing so will lower both email
>> volume and login attempts and thereby reduce load on the Calmail system.
>> 2)Access to email from anything but the CalMail web clients (available
>> at
>> http:calmail.berkeley.edu) has been disabled. This dramatically reduces
>> the number of simultaneous connections from cell phones, iPads, and
>> clients such as Outlook and MacMail, which are often configured to
>> maintain persistent connections and in doing so place extremely heavy
>> load
>> on the CalMail platform.
>> 3)Moving some users to other campus email services temporarily to
>> further
>> reduce load.
>>
>> While these are drastic actions, none were undertaken lightly but were
>> done after extensive consultation with technical experts from campus
>> central and departmental staff as well as vendors we have enlisted to
>> work
>> on the problem.
>> Once the new storage system is installed, configured and tested, we will
>> begin the migration process from the legacy storage system to one more
>> than double in size and expected to handle at least several times our
>> current load.  That process itself will put substantial load on the
>> legacy
>> storage environment as data is copied so we are planning on doing this
>> work off hours.
>>
>> We will continue to provide updates and information about the latest
>> plans
>> via Calmessages as well as posting the information here:
>> http://ist.berkeley.edu/ciocalmailupdates
>> We will also continue to provide regular updates via
>> http://systemstatus.berkeley.edu
>>
>>
>
>
>
>
> -------------------------------------------------------------------------
> The following was automatically added to this message by the list server:
>
> To learn more about Micronet, including how to subscribe to or unsubscribe
> from its mailing list and how to find out about upcoming meetings, please
> visit the Micronet Web site:
>
> http://micronet.berkeley.edu
>
> Messages you send to this mailing list are public and world-viewable, and
> the list's archives can be browsed and searched on the Internet.  This
> means these messages can be viewed by (among others) your bosses,
> prospective employers, and people who have known you in the past.
>



 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] CalMail Status

Greg Merritt *
In reply to this post by Christopher Brooks
The current CalMail plan over the last year or so, at least as perceived
here from the outside, is "underfund due to lack of money, cross fingers,
hold breath, wait until we can outsource in 2012."

A calculated risk.  Unfortunately the centralized savings (reduced admin
staff and insufficient infrastructure maintenance/growth) have probably
been instantly negated by uncountable distributed costs spread over the
entire campus population due recent problems.

This is not entirely unlike the reduced frequency of the emptying of
wastebaskets from campus offices -- small acute centralized savings,
distributed hidden incremental cost increases hitting budgets all over
campus -- but multiplied many many times.

An emerging theme, perhaps?

-Greg, who wishes he could remember where he put that magic wand...



On Wed, 30 Nov 2011 04:30:56 -0800, [hidden email] wrote:

> Shel,
> I appreciate all the efforts that you and the staff are going through to
> bring Calmail back on line.
>
> I take this as a wake up call for disaster recovery plans for my own
> cluster.
>
> Under what circumstances do we fall back on a different set of Calmail
> hardware?
>
> Presumably there are disaster recovery plans that have us host our email
> system elsewhere.
>
> Are those plans publicly available for CalMail?  If the building that
> contains Calmail is destroyed, what happens?  Is there some sort of
> service level agreement and if a certain level of service is not met,
then

> we move our mail system to somewhere else (UC Davis?)?
>
> At this point, it seems that moving Calmail to another location is not
> necessary, but at what point would Calmail be moved?
>
> These are all good questions and answers for a lessons learned document
> that I hope is forthcoming.  I'd be happy with an email message to
> Micronet that addresses these topics.
>
> I briefly searched for information about Calmail's disaster recovery
plans

> but did not find anything.  It would be OE to determine what sort of
> disaster recovery information should be made available and to have sites
> like Calmail and the Hub provide this information.
>
> I, for one, see this is an excellent moment to update my list of
> alternative email addresses for my frequent campus contacts.
>
> I'm also going to take this opportunity to check my back up system and
> push forward on replacing some old hardware.
>
> Many thanks again to everyone for all their efforts over the weekend.
I'm

> sure some long, anguished hours have been put in.
>
> _Christopher
>
> Christopher Brooks, PMP                       University of California
> CHESS Executive Director                      US Mail: 337 Cory Hall
> Programmer/Analyst CHESS/Ptolemy/Trust        Berkeley, CA 94720-1774
> ph: 510.643.9841                                (Office: 545Q Cory)
> home: (F-Tu) 707.665.0131 cell: 707.332.0670
>
> Shel writes:
>
>> I very much appreciate that it is unacceptable to have a core campus
>> system like CalMail offline for any length of time.  Today`s continued
>> problems with CalMail have been particularly difficult for all
involved.
>> We work hard to design and operate systems that can handle the needs of
>> our community and when we fail to meet that standard we bring in
outside
>> experts to help us improve.  We have added additional outside experts
>> from
>> other campuses, vendors and new team members from the Berkeley
technical

>> community as we work through this crisis. This is our highest priority
>> and
>> will remain so until we have the environment fully stabilized.  The
>> following message provides current information.  You can also check
>> http://systemstatus.berkeley.edu for the latest status.
>>
>> Regards,
>> Shel Waggener
>> Associate Vice Chancellor and CIO
>>
>> CURRENT STATUS SUMMARY
>> 1) The CalMail system is available only through web clients at
>> http://calmail.berkeley.edu. All messages are being sent and received
but

>> can only be accessed via webmail.  Webmail sessions may be slower than
>> normal due to volume.
>> 2) Students are strongly encouraged to forward their email to alternate
>> email accounts. Instructions for how to do that can be found on the
>> Calmail site (http://calmail.berkeley.edu, Manage Account option).
>>
>> DETAILS:
>>
>> CalMail has been substantially impacted during the last 36 hours after
>> the
>> successful recovery from database corruption this weekend.  The load on
>> the system has remained extremely high as the millions of backlogged
>> messages are delivered.  The load situation worsened considerably
Monday
>> morning as tens of thousands of campus community members returned from
>> the
>> holidays and connected for the first time, pushing the load above
>> operating limits.  Normal processes that usually run in the background
>> unnoticed, including copying data off of a failed hard drive,
effectively
>> shut down the system for many people.  Attempts to keep email moving
>> during the day Monday were minimally successful.  Monday night during
off
>> hours work was undertaken to accelerate the repair of the failed disk
and
>> prepare for anticipated continued high load through this week.  On
>> Tuesday
>> morning, the load exceeded even the unusually high levels experienced
on

>> Monday and ultimately caused the entire system to become unusable.
>> Unfortunately the root cause of this problem - insufficient capacity on
>> the legacy CalMail environment - cannot be resolved safely without the
>> new
>> storage array that is not expected to be available until the weekend in
>> spite of overnight delivery of key components and around the clock work
>> of
>> staff and vendors. We recognize how critical email is, this week in
>> particular, to our ability to perform our work and have taken the
>> following immediate actions to assist in lowering the load to allow
email
>> to continue to flow.
>> 1)While faculty and staff email must remain on a university provisioned
>> email service, students with external email accounts are encouraged to
>> forward their CalMail messages there.  Doing so will lower both email
>> volume and login attempts and thereby reduce load on the Calmail
system.
>> 2)Access to email from anything but the CalMail web clients (available
at
>> http:calmail.berkeley.edu) has been disabled. This dramatically reduces
>> the number of simultaneous connections from cell phones, iPads, and
>> clients such as Outlook and MacMail, which are often configured to
>> maintain persistent connections and in doing so place extremely heavy
>> load
>> on the CalMail platform.
>> 3)Moving some users to other campus email services temporarily to
further
>> reduce load.
>>
>> While these are drastic actions, none were undertaken lightly but were
>> done after extensive consultation with technical experts from campus
>> central and departmental staff as well as vendors we have enlisted to
>> work
>> on the problem.
>> Once the new storage system is installed, configured and tested, we
will

>> begin the migration process from the legacy storage system to one more
>> than double in size and expected to handle at least several times our
>> current load.  That process itself will put substantial load on the
>> legacy
>> storage environment as data is copied so we are planning on doing this
>> work off hours.
>>
>> We will continue to provide updates and information about the latest
>> plans
>> via Calmessages as well as posting the information here:
>> http://ist.berkeley.edu/ciocalmailupdates
>> We will also continue to provide regular updates via
>> http://systemstatus.berkeley.edu
>>
>>
>
>
>
>  
>
-------------------------------------------------------------------------
> The following was automatically added to this message by the list
server:
>
> To learn more about Micronet, including how to subscribe to or
unsubscribe
> from its mailing list and how to find out about upcoming meetings,
please
> visit the Micronet Web site:
>
> http://micronet.berkeley.edu
>
> Messages you send to this mailing list are public and world-viewable,
and
> the list's archives can be browsed and searched on the Internet.  This
> means these messages can be viewed by (among others) your bosses,
> prospective employers, and people who have known you in the past.

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] CalMail Status

Greg Merritt *
In reply to this post by William
On Wed, 30 Nov 2011 08:40:09 -0800, [hidden email] wrote:
>
> P.S. Overnight delivery doesn't seem to mean what it used to.


Last night I printed out my research group's Thursday meeting schedule and
taped it up to the office's exit doors.

-Greg

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.