[Micronet] Webcrawling policy (using a campus machine/IP to crawl an external site?)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Micronet] Webcrawling policy (using a campus machine/IP to crawl an external site?)

Jack Burris
Howdy,
I have a PhD student who would like to run a webcrawler on an external social media site for ten days.  Are there campus policies regarding this?  I'm pretty sure he is planning to run it with a named user agent.  

Your thoughts, suggestions?

Thanks,
Jack Burris
SSDL/SSCL



 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Webcrawling policy (using a campus machine/IP to crawl an external site?)

Lisa Ho
Hi Jack,

If the social media site and posts are public, then I don't think any of our campus or UC policies address webcrawling them.  

An operational note from System and Network Security:

It's possible for webcrawlers to go awry and cause problems that
resemble an attack on the target resource.  If it begins to look like
an attack, either due to security ops monitoring or due to a report
from outside the sec ops team, then we would likely ask the person(s)
running the webcrawler correct the situation or stop the crawler.  If
that doesn't correct the problem, then in the case of an external
report, we're very likely to knock the device off the network.

It's against terms of service to crawl some sites, so check into that.  

Also, you didn't mention the purpose of the webcrawler, so I'm not sure of the relevance here, but you should be aware that there has been some debate over the ethics of archiving public social media posts that were intended to be ephemeral:


Lisa
-- 

Lisa Ho
IT Policy Manager




On Wed, Mar 20, 2013 at 10:46 AM, Jack BURRIS <[hidden email]> wrote:
Howdy,
I have a PhD student who would like to run a webcrawler on an external social media site for ten days.  Are there campus policies regarding this?  I'm pretty sure he is planning to run it with a named user agent.  

Your thoughts, suggestions?

Thanks,
Jack Burris
SSDL/SSCL




 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.