I have, in the past, pointed out that the people who run the Office of Internet Technology do a seemingly awful job of maintaining the Williams network.

Well, it seems the situation has not changed:

> Williams Network Users,
> We are having Internet connection problems. I apologize for the lack of
> communication from me about this. At the end of the day Friday I thought
> things were on their way to healthy and posted a Daily Message which I
> thought would be distributed Saturday morning. It turns out it’ll go out
> Monday. The following text is mostly from the Daily Message with some
> updated information. If you don’t want to read the whole thing, the basic
> message is: We know that we’re having a problem and we are working
> non-stop to fix it. We are treating this as a network down emergency.
> We are very sorry for the inconvenience caused by this service outage. In
> trying to make things better we unfortunately made them worse.
> Williams College normally has two connections to the Internet. Up until
> last October one of them was to Global Crossings and the other to TVC,
> Tech Valley Communications. In early October the Global link failed after
> a major thunderstorm which blew one of the microwave dishes out of
> alignment. Global was unable to repair the link so we cancelled our
> contract with them. We immediately started working on a replacement. That
> replacement is now ready in the form of a high speed “lit fiber”
> connection to Springfield MA. In Springfield we connect to Crocker
> Communications for our Internet service.
> Friday we tried to bring the new connection online. This should have been
> a trivial exercise involving making the physical connection and turning on
> a router link. Unfortunately when we turned on the link the new set of
> routing tables downloaded from Crocker immediately exceeded the memory
> capacity of the router and the router crashed. As soon as it rebooted the
> process was repeated and it crashed again. This happened several times
> before we disconnected the Springfield line.
> The result of a router going up and down like that is called a flapping
> route. This causes routers in other parts of the Internet to ignore our
> network for a period of time in order to ensure the stability of their own
> router. It didn’t take very long for us to get the TVC connection running
> in the old configuration as our single link to the Internet, but by that
> time we were being cut off from large parts of the Internet. It can take
> up to six hours for the routing tables to settle down and work normally
> again.
> As I said, when we left Friday evening things were improving and we had
> every hope that within a few hours things would be back to normal.
> Saturday morning tracking graphs showed a fairly normal flow of traffic,
> but soon, under increased load, problems started developing once again.
> The problem now is that something is overloading the router. We are trying
> to figure out whether that is a configuration problem resulting from the
> attempt we made Friday to bring up the new connection (this is most
> likely), a denial of service attack, or something else.
> We had two sets of routing tables loaded in October with no problems and
> we expected that we would have no problems maintaining two connections
> again. Unfortunately reality doesn’t always meet expectations. We have
> been anxious to get the second link back up and running because a service
> problem with a single connection would result in a complete disconnection
> from the Internet. With two connections we can survive a failure of one
> with just a slight slowdown in access speeds. Next week we will add memory
> to the router and try again to bring up the connection. Once again, we are
> very sorry for the inconvenience.
> Regards,
> – Mark Berman, Director for Networks & Systems, OIT.

As I have said on many occassions, I do not understand technology well enough to know why Williams seems to constantly suffer from technology problems, at extremely inconvenient times, which do not affect those of us in the real world.

A friend of mine who does understand these things, poses questions of the OIT’s handling of the Williams network:

While the headline seems to be “oops, our connection is down,” I think
it is much more interesting that Berman let Williams go four and a half
months without a backup. As he, himself, notes… “a service problem
with a single connection would result in a complete disconnection from
the Internet.” And at the very least, why would he attempt to make this
transition in the middle of the semester when he already had three
lengthy breaks (Thanksgiving, Christmas, Dead Week) to try it out

Perhaps the Williams community could finally get the answers that I never received when I, several years ago, sent the letter linked to above?

Print  •  Email