Difference between revisions of "Understanding and Mitigating the Impact of Web Robot and IoT Traffic on Web Systems"
m (moved Caching Robot and IoT Requests for Web Servers to Towards Understanding and Mitigating the Impact of Web Robot Traffic on Web Systems: Widen the scope of the project) |
(→Approach) |
||
Line 3: | Line 3: | ||
== Overview == | == Overview == | ||
− | == | + | == Research Tasks == |
− | === | + | === Robot classification === |
+ | ==== Automatic separation of robots into functional classes ==== | ||
+ | ==== Automatic workload-based classification of robots ==== | ||
+ | |||
+ | === Traffic Generation === | ||
+ | ==== Robot arrival process ==== | ||
+ | ==== Assigning resource requests ==== | ||
+ | |||
+ | |||
+ | === Robot-resilient Web caching === | ||
Ideally, Web caches should be equipped to predict the exact resource that will be requested next by a Web Robot session. This is not feasible due to the large set of resources that are available on a Web server. Even predicting the extension of the next resource may require a model to predict one type out of hundreds, a task that is challenging for a lightweight classifier to perform in real time. Instead we follow previous work <ref name="robotAnalysis" /> and cluster resources into types. Predicting the next type of resource may provide a smarter alternative since the popularity of robot requests exhibits a power tail <ref name="detectingRobots" /> and as such the most popular resources of a predicted type are the ones likely to be requested next. | Ideally, Web caches should be equipped to predict the exact resource that will be requested next by a Web Robot session. This is not feasible due to the large set of resources that are available on a Web server. Even predicting the extension of the next resource may require a model to predict one type out of hundreds, a task that is challenging for a lightweight classifier to perform in real time. Instead we follow previous work <ref name="robotAnalysis" /> and cluster resources into types. Predicting the next type of resource may provide a smarter alternative since the popularity of robot requests exhibits a power tail <ref name="detectingRobots" /> and as such the most popular resources of a predicted type are the ones likely to be requested next. | ||
Revision as of 19:12, 23 October 2015
Contents
Introduction & Motivation
Overview
Research Tasks
Robot classification
Automatic separation of robots into functional classes
Automatic workload-based classification of robots
Traffic Generation
Robot arrival process
Assigning resource requests
Robot-resilient Web caching
Ideally, Web caches should be equipped to predict the exact resource that will be requested next by a Web Robot session. This is not feasible due to the large set of resources that are available on a Web server. Even predicting the extension of the next resource may require a model to predict one type out of hundreds, a task that is challenging for a lightweight classifier to perform in real time. Instead we follow previous work <ref name="robotAnalysis" /> and cluster resources into types. Predicting the next type of resource may provide a smarter alternative since the popularity of robot requests exhibits a power tail <ref name="detectingRobots" /> and as such the most popular resources of a predicted type are the ones likely to be requested next.
Class | Extensions |
---|---|
text | txt, xml, sty, tex, cpp, java |
web | asp, jsp, cgi, php, html, htm, css, js |
img | tiff, ico, raw, pgm, gif, bmp, png, jpeg, jpg |
doc | xls, xlsx, doc, docx, ppt, pptx, pdf, ps, dvi |
av | avi, mp3, wvm, mpg, wmv, wav |
prog | exe, dll, dat, msi, jar |
compressed | zip, rar, gzip, tar, gz, 7z |
malformed | request strings that are not well-formed |
noExtention | request for directory contents |
Classification Algorithms
To predict the type of a Web robot request, we consider algorithms that try to predict the type of the nth resource requested given a sequence of past n - 1 request types. A training record is denoted ri = (vi,li) where vi is the ordered sequence of the past n - 1 requests and li = xn is the type of resource requested after the sequence vi. Figure [ID] shows an example with n = 10. The first record is composed of the first nine requests, and its class label is the tenth request; the second record is composed of the second request through the tenth request and its label is given by the eleventh request. The trained predictor will maintain a history of the previous n - 1 requests and, based on this history, generate the predicted label for the next request.
Datasets
Results & Analysis
Acknowledgement
This paper is based on work supported by the National Science Foundation (NSF) under Grant No. 1464104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
References
<ref name="robotAnalysis"> D. Doran, “Detection, classification, and workload analysis of web robots,” Ph.D. dissertation, University of Connecticut, 2014.</ref> <ref name="detectingRobots"> D. Doran and S. Gokhale, “Detecting Web Robots Using Resource Request Patterns,” in Proc. of Intl. Conference on Machine Learning and Applications, 2012, pp. 7–12.</ref> <references/>