[Micronet] Requesting feedback on OE-IT servers and data centers case study

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Micronet] Requesting feedback on OE-IT servers and data centers case study

Russell Connacher
Hi all,

As the end of the Operational Excellence design phase draws near, the IT design
team is making a final push to make sure everyone has had adequate opportunity
to give feedback on our case studies. I've been charged with incorporating
feedback into the case study on servers and data centers.

We've already received some very detailed and valuable comments and corrections
on it -- all very much appreciated. Still, since it has only recently been
published, I'm making this special plea that you all take the time to read it
and comment however you see fit. I'm especially seeking those of you who operate
servers, officially or unofficially, physically under your desk or virtually in
the cloud.

You can find the case study on the OE-IT design team's bSpace site
<http://bspace.berkeley.edu/join/portal/site/7ea7d89c-b87c-4221-a2ea-16aee252b967/>
under Resources / Topic - Servers and data centers.

The best vehicle for providing feedback is through the on-line form set up at
<https://spreadsheets.google.com/viewform?formkey=dEc4RGp4Y1g4TlpZVWYzWFZud2JvY2c6MQ>
(Be sure to select "Case: servers and datacenters" there so I will be sure to
find your comments.)

Thanks for your continuing efforts to make Cal's IT work better, on this
initiative and everyday,
Russ



--

Russell Connacher                   206 Evans Hall # 2924
Information Systems                 Berkeley, CA 94720-2924
Undergraduate Advising              (510) 643-9892
College of Letters & Science, UCB   [hidden email]

(please consider the environment before printing this email)

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Requesting feedback on OE-IT servers and data centers case study

Michael Sinatra-2
On 3/4/11 3:52 PM, Russell Connacher wrote:
> Hi all,
>
> As the end of the Operational Excellence design phase draws near, the IT design
> team is making a final push to make sure everyone has had adequate opportunity
> to give feedback on our case studies. I've been charged with incorporating
> feedback into the case study on servers and data centers.

I hope you don't mind if I give my feedback on Micronet.  It's hard for
folks to know if someone has given feedback in the past on the same
topic.  (I see that the last IT "feedback received" document on bSpace
is from December 2010.)

The biggest problem I see with the document is that there is an
inadequate analysis of the trade-offs involved in the recommendations
you are considering.  The biggest and most obvious of these is the
optimization of network backhaul (or relative failure to do so) that
comes from service location options.  A really interesting example
surfaced a few years ago when I was discussing the placement of a
particular computational node with a campus department.  The department
wanted to place a large tape unit connected to a server in the
department space but have it transfer data to a computational node in
the data center.  The tape unit needed to be in the department so that
research staff could load and retrieve data tapes when new datasets were
physically received from other locations.  The data would then be
transferred to the computational node, and the results would be
retrieved by users *back in the department.*  Obviously there were
bandwidth and latency implications.

I told the department that this was a highly un-optimized solution; they
were wasting a huge amount of network backhaul just to put their
computational node in the data center.  The departmental administrator
responded that he had considered that issue, but the power and cooling
requirements of the computational node were sufficient that location in
the campus data center outweighed the network backhaul issues.  The
investment required to house the system in a local facility was too
great.  In this case, the departmental administrator had carefully
considered the trade-off.  That may also have been done by the
committee, but there is no evidence of it, and it therefore can't be
reviewed by campus folks.  That's where I see the big problem.

The backhaul issue also has implications for energy consumption.
Placing a computational node topologically far from its data source will
increase the amount of network equipment and power consumption required
to transfer the necessary data.  Such equipment cannot be consolidated
or virtualized in the same way that traditional computing resources can.
  Power budgets for network equipment are usually readily available and
such data can be used to weigh the relative merits of the location of
services.

Another case in which the trade-offs are inadequately considered is in
the issue of virtualization.  Virtualization is highly effective at
reducing the inefficiencies of multiple physical servers, but it also
adds a layer of complexity--with a concomitant set of potential security
vulnerabilities.  The savings from virtualization need to be offset
against the additional labor that will need to be spent in additional
configuration and troubleshooting systems that are part of a virtualized
infrastructure.  Even where such services are provided on a recharge
basis, there has generally not been a such a full accounting.  This type
of accounting can augment the existing known weaknesses of
virtualization (such as i/o) and can be used to provide a better picture
of when virtualization makes sense and when it doesn't.  While I don't
expect the current document to be able to provide that level of
granularity, it should at the very least provide a framework from which
a set of criteria can be derived (I am thinking here of what OrgSimp is
trying to do), and I don't see that in the current document.

A third case where failure to articulate trade-offs weakens this
document is in the discussion of outsourcing.  In this case, the
backhauling issue rears its head again, and so does complexity.  The
problem here is that when you outsource to "Google" or "Amazon," you are
actually outsourcing to Google or Amazon, *and* everyone between you and
them, which may include service providers you have never even heard of
and with whom you have no contract or SLA.  Moreover, the path that
network traffic takes between you and Google or Amazon may change from
day to day or hour to hour and may not match the path that the traffic
takes back to you.

Ask any network engineer what they love about troubleshooting and the
notion of dealing with multiple administrative domains on a single
problem is usually placed in the vicinity of the seventh circle of Hell.
  We already have issues dealing with multivendor layers in single
systems, and it's easy to see why they are so difficult for those involved.

Another aspect of the outsourcing question that has bugged me for some
time has to do with talent.  The document makes statements like, "we
will enable the IT staff who now manage small servers to focus their
considerable engineering talents on innovation rather than maintenance."
  I am not sure what sort of innovation we expect to happen here (will
they invent the next Facebook?), but my experience is that when people
who are good at running things are forced to do other things, they
simply pack up and leave.

For what it's worth, I wouldn't have stuck around it UCB for so long if
I hadn't gotten started running a small server.  (Fortunately, it was on
top of a desk, not under it, so it wasn't such an egregious OE
Violation.)  Where are the next generation of Erik Klavons, Rune
Stromsnesses, and Mike Howards--all people who started out as students
running departmental services--going to come from?  Let's face it, the
university doesn't exactly give the outward appearance of being a
paradise of a workplace, given the state budget situation and current
political climate.  We can't be the New York Yankees of IT; we need to
develop talent from within, and you can't have a reasonable IT
organization if it's filled with Mark Zuckerberg wannabes.

It's another trade-off and one of several not considered by the case
study.  But the biggest problem is that the case study doesn't even
acknowledge that there might be any trade-offs at all--just
"implementation challenges."  The latter is synonymous with short-term
hurdles, while the former deals with the long-term and comprehensive
costs of pursuing a particular direction.  Without that, the case study
is incomplete.

michael

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.