A weblog is defined as a web page with minimum to no outside editing, providing on-line observations, cyclically modernized and existing in invalidate sequential order, with hyperlinks to other online sources. Blogs can function as personal diaries, technical advice columns, sports chat, celebrity gossip, political commentary or all of the above.

The increase on the Internet blogging, the design of journal-like web page logs, has created a highly forceful subset of the World Wide Web (WWW) that evolves and responds to real-world events. Indeed, blogs (or weblogs) have newly emerged as a new common publishing medium. The size of the so called blog sphere (or blogosphere) which is the collection of blogs on the Internet - has been rising exponentially for the last few years, with the number of blogs tracked by Technocratic doubling every six months.


The increase of the blog sphere has led to the establishment of new search engines tracking and searching the contents of blogs, thus servicing the need of Internet users for possible recovery tools for the blog sphere. Today, there exist several blog search engines - some focusing on searching blogs, such as Blog Digger, Blog Pulse and Technocratic; and some expert services from the main Web search engines, such as Google, Yahoo! and keeves.

Blogs have many interesting features: entries are added in sequential order, sometimes at a high volume. In addition, many blogs are created by their authors, not planned for any sizable viewers, but purely as a method for self-expression. Extremely available blog software has facilitated the act of blogging to a wide-ranging audience, their blogs shiny their opinions, philosophies and emotions. Traditional media tends to focus on “heavy-hitting” blogs devoted to politics, punditry and technology. However, there are many different genres of blogs, some written around a specific topic, some covering several, and others talking about personal daily life.



There are numerous definitions of blogs other than the one given in Section 1. Some are shown below.

• From “Web log:” A blog is basically a diary that is available on the web. The movement of updating a blog is “blogging” and someone who keeps a blog is a ”blogger.” (

• An easily updated personal website, generally updated daily and expressing opinions. (

• Web pages that are constantly updated with new commentary and links relating to a particular topic. (

That is, a blog is a website that anybody can easily update and use to express his/her own opinions in a public space. To put it another way, blogs are a storehouse of information that reflects public opinion. Although there is a lot of trivial information in blog space, there is also a lot of important information. Before defining our model of blog thread data, let us discuss these definitions of blog sites and blog entries. Examples are shown in Figures a and b.

Site = (site URL, RSS, blogger, site Name, entry)
Entry = (permalink, blogger, time, title, description, comment)
Comment = (blogger, permalink, content, time)
A blog site has a site URL, RSS (really simple syndication), site name, and entries, and is managed by one or more bloggers. A blog entry has a permalink for access, a publication time, title, and entry description. A comment includes the content of the comment and the time when it was written. A blogger posts a blog to an entry identified by a permalink.
Reply Link = (ei, ej), (ei → ej)
Trackback Link = (ei, ej), (ei → ej)
Source Link = (ei, wi), (ei → wi)
Where ei, ej ∈ E, E is a set of blog entries, and wi ∈ W,
W is a set of Web pages except blog entries.


Figure (a) Blog site example


Figure (b) Blog entry example

Reply Links and source Links are hyperlinks described in the description of a blog entry to other blog entries or web pages. They do not include automatically added hyperlinks, such as links to previous entries, the next entry in the blog site, or other links unrelated to the content of the description of the blog entry. This is because we want to remove link noise to ensure that all pages point to, etc. Here, a trackback Link is a special case of a reply Link.
For trackback Link = (ei, ej), there is not only a reply Link of (ei → ej) but also a link of (ej → ei) to indicate the existence of a reply Link.

A blog thread is composed of entries connected via reply Links to a discussion among bloggers. There is one exception. As Fig. c indicates, sets of entries that are not connected to each other via reply Links are regarded as being the same thread if they refer to the same website via a source Link. Comments attached a blog entry are not used in order to extract blog thread, because we want to identify important bloggers by analyzing blog thread so that it is not very important of comment author whose blog site could not be identified. Namely, a blog thread is a directed connected graph and is defined as follows.

Thread: = (V, L)
V = W ∪ E, L = Ls ∪ Lr
W is a set of websites. E is a set of blog entries.
Ls ⊆ {(e, e’) |e ∈ E, e’ ∈ W}
Lr ⊆ {(e, e’) |e ∈ E, e’ ∈ E}
Ls correspond to a set of source Link.
Lr corresponds to a set of reply Link.
Ideally, the entries in a blog thread should share common topics. However, topics sometimes change at a particular point. In the future, we will separate a blog thread at that point even if both parts are connected via reply Links.


Figure (c) Blog thread

The networked structure of the blogosphere

Perhaps the most important difference between blogs and more traditional media is that blogs are networked phenomena that rely on hyperlinks. Some blogs consist of little more than lists of hyperlinks; others include lengthy commentaries. All blogs by definition link to other sources of information, including, most pertinently, other blogs. The universe of blogs is conventionally referred to as the “blogosphere.”

Links between blogs take two forms. First, many bloggers maintain a “blog roll” on their website; a list of blogs that they frequently read or especially admire, with clickable links to the general URLs (web addresses) of those blogs. Blog rolls usually occupy a permanent position on the blog’s home page. Blog rolls provide an excellent means of situating a blogger’s interests and preferences within the blogosphere; bloggers are likely to use their blog rolls to link other blogs that have shared interests. Second, bloggers may write specific posts that contain hyperlinks to other blogs. Unlike links in the blog roll, links within posts will be archived as new posts replace old ones over time.

Typically, such posts themselves link directly to a specific post on the other blog (rather than the blog’s general URL address), perhaps also providing some commentary on that post.
Posts commenting on posts are a key form of information exchange in the blogosphere. Although they mean that discussions in the blogosphere can often have the characteristics of an echo-chamber – bloggers commenting on bloggers commenting on bloggers– they also allow for a means of rough and ready information filtering.

Links and page views are the currency of the blogosphere. Many bloggers desire a wide readership. Conventional wisdom suggests that “the most reliable way to gain traffic [readership] is through a link on another weblog.” This stems from the nature of hypertext. Ceteris paribus, when one blog links to another, the readers of the former blog are more likely to read the latter after having clicked on a hyperlink than they would have been otherwise. If they like what they read, they may even become regular readers of the second blog.

Blogs as Social Network

Social network testing is the quantitative learning of the interaction between individuals or organizations. Social network analysts signify interaction in graphs where individuals or organizations are portrayed as nodes also referred to as actors or vertices and their connections to one another as edges also referred to as ties or links. By quantifying social structures, social network analyst can verify the most important nodes in the network.


Applying social network testing methods to the blogosphere has exposed remarkable answer about how individuals distribute information and cooperate socially online. For example, the connecting patterns of blogs can be used to expect paths of information flow through the blogosphere (E. Adar and L. Adamic: 2005). Adamic and Glance (2005) looked at the network configuration of political blogs during the 2004 U.S. Presidential election and found that the connecting patterns between traditional and moderate blogs formed two fairly separate communities (L. Adamic and N. Glance :2005 ) . Moreover, Herring initiate that blogs that link to each other and are part of the same community have an affinity to declare one another in blog posts and to converse with one another through comments. These credentials between blog pairs occurred “on multiple occasions, suggesting the existence of a relationship between them, not just a one-time exchange” (Herring et al. 2005).

Computer-mediated communication and Relationship formation

The cause of the Internet on its user’s actual world common interactions and involvement has been a topic of debate. Kraut found that as participants used the Internet more habitually, they communicated with family members less frequently and experienced more instances of sadness and aloneness (R. Kraut, M. Patterson, V. Landmark, S. Kiesler, T. Mukopadhyay, and W. Scherlis, "Internet parado: 1998). Moreover, the study initiate that the range of the participants in social networks decreased with long-standing Internet use. Moreover, Cummings argued that online relationships were not as intimate as face-to-face or telephone communication. In reviewing email and Listserv communication, they observed that “social places on the Internet where close personal relationships are formed and maintained are rare.” (J. Cummings, B. Butler, and R. Kraut : 2002)

On the other hand, a great number of studies have shown that Internet use can help individuals in developing and satisfying relationships. In a follow-up study, Kraut found that Internet use had a positive contact on communication and social involvement. Wellman points out that the Internet enables individuals to retain active social ties and develop new social ties with others sharing similar interests. He also argues that “the more contact people have online, the greater the impression they make on each other.” McKenna conducted surveys of online newsgroup posters that showed that some close online relationships naturally progress to face-to-face interactions and that a common of those surveyed experienced an increase in their social networks due to online connections, regardless of the size of their social networks. (K. McKenna, A. Green, and M. Gleason, 2002)

Community structure

Our study focuses on three blog communities, each with a central site containing a listing of blogs. We now use network analysis to determine to what extent each set of blogs actually forms a community by engaging in reciprocal, group interaction. But first we examine the different forms of interaction that can take place.

Blog ties: blog rolls, citations, and comments

Social relationships can be expressed online as different forms of blog ties:

Blog roll links are reogularly located in the blog’s sidebar and point to other blogs that the creator may read or simply want to always include on her main page. Blog rolls are normally reorganized uncommonly.

Citation links are made by bloggers within their personal posts and can mention an whole blog or just a particular post on that blog. By their nature, they occur at a permanent time point, but may be repeated. Repeated citations can serve as a load for the fix, with more frequent citations representing a greater interest of one blog for another.

Comment links are not essentially hyperlinks, but an interaction that occurs when one person, probably a blogger, adds a comment to another blogger’s post.

For both blog rolls and citations, the communication is indirect. It occurs on the blog with the blog roll or citing blog post, but may be noticed by the blog being referenced through blog search engines, server logs, or through Trackbacks. Trackbacks allow the citing blog to inform the blog receiving the citation that their post has been discussed. The receiving blog will normally display the Trackback, along with summary text of the citing post. Readers are then able to follow conversation across several blogs by traversing Trackbacks.

Density and centralization

The primary measure of community we consider is the density of links between blogs within the community. The greater blogosphere is infinite and the chance of a link decreasing randomly within one of these little communities is insignificant. We find, however, that blogs link on common to one or more other blogs within the community. Blogroll associates are more several than post citations.

Centralization measures the equal opportunity of allocation of links. In all communities, there is better centralization of outdegree , meaning that a small number of blogs have very long blogrolls and/or cite other blogs often, while many list few or no other community blogs in their blogrolls.


It is feasible for one blog to link too many others, but not accept links in variety. Similarly, a blog may be very fashionable but not reciprocate the attention it is acceptance. If there is true interaction and relationship formation within the community, the ties will be joint, which we found true for all three communities. Reciprocal blog roll links show probably only a mutual attentiveness, whereas reciprocal post citations involve that both blogs dynamically discussed or linked to one another in their posts , a less likely event given that post citations are less in number. It is therefore conventional to find that a greater part of blog roll links are shared than are post citations. In general, bloggers do tend to return blog roll links merely as a good manner. (E. Berscheid and H. T. Reis,1998.)

Social roles in blog comments

When people leave remarks about a blog post, they are representing that they read the blog post and have some feedback to offer the blogger. Since they are straighter and often represent nonstop interaction, the interactions that bloggers have with one another through comments may be more suggestive of the depth of their relationship than other blog ties. Along with tracking the blog roll and citation networks for the three communities, we also examined the comments network of the community. As we examined the interactions in the comments network, we observed several central hubs receiving the majority of the blog comments in the network and a few of the bloggers in the community posting a inconsistent number of comments. (F. Viégas and M. Smith, 2004)

Example of Blog Application

This section shows a way of inserting important supplementary information with immediacy when browsing news content.
Though famous news sites and TV programs may deliver important information with immediacy, they cannot cover all the information relating to a news topic. We feel that it is important to provide a variety of supplementary information to audiences to avoid presenting them with biased opinions. For example, the information provided could include entries by important bloggers, part of a blog thread, related news referred to by an important blogger, and so on. Even when using a conventional technique such as a search engine, the system cannot pre-crawl supplementary information on a news item because the information may not have existed before the news event occurred. We therefore propose the following application shown in Figure. , which is the system that can insert important supplementary information with immediacy when a user browses news contents.

The operation of the application system is outlined below.

1. The system identifies important bloggers on particular topics.

2. When news content is delivered, the system estimates the topic of the news content.

3. The system crawls the blog data from important bloggers related to the news content.

4. The crawled blog data are categorized using a clustering method.

5. The system provides blog data from important bloggers that differ to some degree from the news content, because data that is the same as news content is not useful as supplementary information.

In future work, we plan to work on identifying important bloggers for various topics, detecting topics in news content, crawling blog data from important bloggers, clustering blog data and presenting information that supplements news content.



In this paper we examined three blog communities in different geographical locations, both by analyzing the network structure of their blog rolls, citations, and comments, and by surveying the bloggers directly. In all three communities, there is strong evidence that blogs do enable relationship formation, with some of those new relationships later extending to other communication media and offline meetings. On the other hand, blogs do not play a large role in helping bloggers sustain their real life relationships; nonetheless, this finding may be due to blogging’s relatively young age.

Although previous blog studies have typically placed more emphasis on blog rolls and citations, we find that much of the community interaction occurs in comments and is not always reflected in blog rolls and citations. In examining the comments dataset, we propose a typology of social roles in commenting: conversation-starters and conversation-supporters. In general, all three communities show high degrees of reciprocity and cohesion.


