{"id":13024,"date":"2021-12-02T20:58:42","date_gmt":"2021-12-03T01:58:42","guid":{"rendered":"https:\/\/carleton.ca\/scs\/?page_id=13024"},"modified":"2026-06-02T14:59:24","modified_gmt":"2026-06-02T18:59:24","slug":"tr-00-04-structural-characterization-of-popular-web-documents","status":"publish","type":"page","link":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-2000\/tr-00-04-structural-characterization-of-popular-web-documents\/","title":{"rendered":"TR-00-04: Structural Characterization of Popular Web Documents"},"content":{"rendered":"\n<section class=\"w-screen px-6 cu-section cu-section--white ml-offset-center md:px-8 lg:px-14\">\n    <div class=\"space-y-6 cu-max-w-child-5xl  md:space-y-10 cu-prose-first-last\">\n\n            <div class=\"cu-textmedia flex flex-col lg:flex-row mx-auto gap-6 md:gap-10 my-6 md:my-12 first:mt-0 max-w-5xl\">\n        <div class=\"justify-start cu-textmedia-content cu-prose-first-last\" style=\"flex: 0 0 100%;\">\n            <header class=\"font-light prose-xl cu-pageheader md:prose-2xl cu-component-updated cu-prose-first-last\">\n                                    <h1 class=\"cu-prose-first-last font-semibold !mt-2 mb-4 md:mb-6 relative after:absolute after:h-px after:bottom-0 after:bg-cu-red after:left-px text-3xl md:text-4xl lg:text-5xl lg:leading-[3.5rem] pb-5 after:w-10 text-cu-black-700 not-prose\">\n                        TR-00-04: Structural Characterization of Popular Web Documents\n                    <\/h1>\n                \n                                \n                            <\/header>\n\n                    <\/div>\n\n            <\/div>\n\n    <\/div>\n<\/section>\n\n<p>Carleton University<br>\n<a href=\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-2000\/\">Technical Report<\/a> TR-00-04<br>\nMay 2000<\/p>\n\n\n\n<h2 id=\"structural-characterization-of-popular-web-documents\" class=\"wp-block-heading\">Structural Characterization of Popular Web Documents<\/h2>\n\n\n\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">Abdolreza Abhari, Sivarama P. Dandamudi, Shikharesh Majumdar<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div>\n<h3>Abstract<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>Characterization of Web documents is essential to study performance issues such as minimizing demands on the back-end servers and communication overheads. In addition, characterization of the Web is also important to devise synthetic workload generators for use in the investigation of effective resource management algorithms. Most characterization of Web documents are based on Web files without considering their inherent structure. To display a complete Web page a collection of files that include the files corresponding to the embedded objects in a page must be transferred. A Web object is defined to be this collection, i.e., a Web page and its related embedded files. Our goal in conducting this study is to collect data on the structure and size of Web objects that is particularly useful in improving Web server performance through techniques such as clustering of files, parallel I\/O, and data caching on client sites. We report the results of an empirical study conducted on several popular (in top 100 sites) Web sites. We have chosen the popular Web sites for this investigation because they are more likely to be efficiently designed. In addition, popular Web servers also account for a significant portion of the network traffic. We also study the trace of a busy proxy access log to characterize Web objects for regular Web environments.<\/p>\n\n\n\n<p><a href=\"https:\/\/carleton.ca\/scs\/wp-content\/uploads\/sites\/260\/TR-00-04.pdf\">TR-00-04.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Carleton University Technical Report TR-00-04 May 2000 Structural Characterization of Popular Web Documents Abdolreza Abhari, Sivarama P. Dandamudi, Shikharesh Majumdar Abstract Characterization of Web documents is essential to study performance issues such as minimizing demands on the back-end servers and communication overheads. In addition, characterization of the Web is also important to devise synthetic workload [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":12258,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_cu_dining_location_slug":"","footnotes":"","_links_to":"","_links_to_target":""},"cu_page_type":[],"class_list":["post-13024","page","type-page","status-publish","hentry"],"acf":{"cu_post_thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/13024","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/comments?post=13024"}],"version-history":[{"count":1,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/13024\/revisions"}],"predecessor-version":[{"id":13025,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/13024\/revisions\/13025"}],"up":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12258"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/media?parent=13024"}],"wp:term":[{"taxonomy":"cu_page_type","embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/cu_page_type?post=13024"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}