Blogs And Blog Thread Datamodel


There are several definitions of blogs other than the one given in Section 1. Some are shown below.
• From “Web log:” A blog is basically a journal that is available on the web. The activity of updating a blog is “blogging” and someone who keeps a blog is a ”blogger.”
• An easily updated personal website, generally updated daily and expressing opinions.
• Web pages that are constantly updated with new commentary and links relating to a particular topic. Often very personal.
That is, a blog is a website that anybody can easily update and use to express his/her own opinions in a public space. To put it another way, blogs are a storehouse of information that reflects public opinion. Although there is a lot of trivial information in blog space, there is also a lot of important information. Before defining our model of blog thread data , let us Discuss these definitions of blog sites and blog entries. Examples are shown in Figures a and b.
Site = (site URL, RSS, blogger, site Name, entry)
Entry = (permalink, blogger, time, title, description, comment)
Comment = (blogger, permalink, content, time)
A blog site has a site URL, RSS (really simple syndication), site name, and entries, and is managed by one or more bloggers. A blog entry has a permalink for access, a publication time, title, and entry description. A comment includes the content of the comment and the time when it was written. A blogger posts a blog to an entry identified by a permalink.
Reply Link = (ei, ej), (ei → ej)
Trackback Link = (ei, ej), (ei → ej)
Source Link = (ei, wi), (ei → wi)
Where ei, ej ∈ E, E is a set of blog entries, and wi ∈ W,
W is a set of Web pages except blog entries


Figure (a) Blog site example


Figure (b) Blog entry example

Reply Links and source Links are hyperlinks described in the description of a blog entry to other blog entries or web pages. They do not include automatically added hyperlinks, such as links to previous entries, the next entry in the blog site, or other links unrelated to the content of the description of the blog entry. This is because we want to remove link noise to ensure that all pages point to, etc. Here, a trackback Link is a special case of a reply Link.
For trackback Link = (ei, ej), there is not only a reply Link of (ei → ej) but also a link of (ej → ei) to indicate the existence of a reply Link.

A blog thread is composed of entries connected via reply Links to a discussion among bloggers. There is one exception. As Fig. c indicates, sets of entries that are not connected to each other via reply Links are regarded as being the same thread if they refer to the same website via a source Link. Comments attached a blog entry are not used in order to extract blog thread, because we want to identify important bloggers by analyzing blog thread so that it is not very important of comment author whose blog site could not be identified. Namely, a blog thread is a directed connected graph and is defined as follows.
Thread: = (V, L)
V = W ∪ E, L = Ls ∪ Lr
W is a set of websites. E is a set of blog entries.
Ls ⊆ {(e, e’) |e ∈ E, e’ ∈ W}
Lr ⊆ {(e, e’) |e ∈ E, e’ ∈ E}
Ls correspond to a set of source Link.
Lr corresponds to a set of reply Link.
Ideally, the entries in a blog thread should share common topics. However, topics sometimes change at a particular point. In the future, we will separate a blog thread at that point even if both parts are connected via reply Links.


Figure (c) Blog thread

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License