CemWEB Research Project

Recent Entries

6/23/05 05:23 pm - Top 100 users by LiveRank

Data current as of December 2004

Look up your own results!

Name Friendscount LiveRank Overall Ranking Broadcaster- free Ranking
quizdiva 724 102 1 1
hipstomp 113 80 2 2
status 0 69 3 none
kim_jong_il__ 1 65 4 3
imjinnie 736 64 5 4
doctor_livsy 69 49 6 5
teh_indy 0 47 7 none
mistersleepless 105 45 8 6
thegraybook 202 44 9 7
k_richardson 0 43 10 8
dimkin 302 42 11 9
rcr 1 42 12 10
jyuufish 0 40 13 11
patiencekills 601 39 14 12
mcrjournal 0 37 17 none
omg_iconz_ 1 37 16 14
throwingstardna 305 37 15 13
worthlessunited 1 36 18 15
studio3dom 0 35 19 16
docbrite 50 33 20 17
_hdcomic 0 33 22 none
cassieclaire 24 33 21 18
jessichrissy 1 32 24 20
pottersues 22 32 23 19
theferrett 248 30 27 23
dolboeb 743 30 25 21
cleolinda 634 30 29 25
ficbitches 2 30 26 22
sam 1 30 28 24
fredryk 74 27 30 26
avva 535 26 31 27
with_gusto 674 26 34 30
jwz 129 26 33 29
vadimus 0 26 32 28
ladyjaida 750 24 35 31
seviet 130 23 38 34
ginmar 268 23 37 33
theformat 65 23 36 32
drugoi 602 22 40 36
chingizid 159 22 39 35
sexwax 1 22 41 37
olegpaschenko 343 21 47 43
andrewkendall 682 21 45 41
5signs 1 21 44 40
8mm 1 21 46 42
ds_flashback 1 21 43 39
the_bitchcave 3 21 42 38
kompressorpower 713 20 55 51
fif 2219 20 49 45
zloebu4ka 601 20 48 44
akuaku 728 20 57 53
kostia_inochkin 728 20 56 52
brad 180 20 54 50
slg_news 26 20 53 49
bookshop 492 20 52 48
prehistoric 151 20 51 47
yukipon 730 20 50 46
mozgovaya 720 19 60 56
polumrak 76 19 59 55
katechkina 243 19 58 54
nedorazumenie 471 19 68 64
neivid 298 19 67 63
copperbadge 72 19 66 62
ishotversace 0 19 65 61
zoe_trope 189 19 64 60
opportunitygrrl 741 19 63 59
hardartist 2 19 62 58
riksowden 0 19 61 57
josienutter 724 18 78 74
robont 744 18 75 71
horsepucky 345 18 77 73
spiritrover 737 18 76 72
hypnox 664 18 74 70
switchknife 347 18 73 69
kore 710 18 71 67
ana 637 18 70 66
icemaiden 0 18 72 68
canticle 0 18 69 65
mandelion 1 17 86 81
murdershack 340 17 85 80
tsl_colourbars 726 17 83 79
nl 55 17 81 77
krylov 681 17 80 76
romochka 13 17 89 84
sarahtales 134 17 87 82
clixnwhistles 0 17 84 none
cmpunk 7 17 82 78
hitlerhitler 0 17 79 75
hawthornehts 17 17 88 83
ljmatch 0 16 96 none
goblin_gaga 699 16 95 90
maccolit 269 16 94 89
olshansky 741 16 93 88
apazhe 345 16 92 87
p0grebizhskaya 744 16 90 85
anniesj 287 16 100 94
infinite_icons 722 16 99 93
imomus 724 16 98 92
muskrat_john 99 16 97 91
cmpriest 604 16 91 86

5/4/05 12:55 pm - Cemcom Blog RSS Feed on LiveJournal

My research group's new blog is now syndicated on LiveJournal. Add cemcom_rss to your friends list if you want to get your friend's page flooded with cool articles about the intersection of technology and culture, as well as relevant CFPs and maybe (every now and then) original work from one our members (including myself). The blog is fairly active; avg of one post a day, but most entries are short. Also feel free to comment on the blog itself!

3/16/05 09:58 pm - Just for Fun

Using researcher2's friends graph mentioned in this post on lj_research, jofish22 and I have come up with some useless statistics about LiveJournal usernames.

jofish22 enumerated the most common two characters to start your LJ name with here. I made a list of the frequency with which various numbers appear in LJ usernames here. Both of these are posts in lj_research.

We actually hope to have useful research to show soon too.

2/1/05 12:42 am - Workshop Paper Accepted

A paper using this project as a basis has been accepted to the Beyond Threaded Conversation workshop @ CHI 2005.

It is available here: Implicit Links in Asynchronous Communication Spaces (Medynskiy, 2005).

Online asynchronous communication spaces are rich in implicit relationships that are constructed through the collective activity of participants in these spaces. Mapping such relationships to edges between actor-nodes in the spaces often results in a graph structure with great potential to inform the design of and for these spaces. In this paper I present examples of implicit relationships in USENET, the blogosphere, and the LiveJournal community space. Further, I discuss design implications of the visualization and analysis of graph structures resulting from such links.

12/20/04 10:10 pm - Results!

This post will summarize my research results for the Fall 2004 semester.

After getting data on approximately 11,000 LiveJournal communities (approximately 1/20th of all communities), I was hoping to compare two representations/visualization of the LiveJournal community-space. One representation is based on 'member of' links between communities (explicit links), and the other is based on members communities share in common (implicit links). As per my last post, however, it turns out that the crawling code I was using was keeping an incomplete set of 'member of' relationships between communities. Thus, the only correct visualizations I have are based on common members between two given communities. Nonetheless, I think the visualizations are rather interesting and I'll spend some time describing them. In the next post, I will talk about future direction.

First, I will outline the type of data collected:
Using the ljspider.pl script (described here), I gathered data on approximately 11,000 LiveJournal communities. Among the data I colleged was a collection of membership lists of all crawled communities -- this is the only data set relevant to this post. It is very important to note that LiveJournal does not provide membership lists for communities with more than 500 members, which constituted 11.8% of all communities I crawled. For the purposes of the following analysis, these communities were dropped from the dataset. To refine the available data, I used the activecomm.pl script (described here) to highlight communities that seem to be active at least monthly (n = 30 in activecomm.pl) and only included those in my final data set. Finally, I computed the number of shared members between every pair of communities in the data set and logged all pairs of communities sharing at least 5 members. The Java code to do this final step will become available on this blog shortly.

The collected data can be thought of as representing a graph: every community is a vertex and the number of members two communities have in common is the weight of the undirected edge between the two nodes representing those communities.

I've been using Graphviz, a free open-source graph visualizer from AT&T Research, to visualize my data. Other visualization tools I looked at, like Walrus or Pajek were not able to produce acceptable results -- Walrus cannot deal with graphs that is not trees and I could not get Pajek to arrange the nodes of the graph in any sort of logical order. However, Graphviz has a hard time working with large graphs and so in order to be able to use it for visualization I needed to constrain my original data set in the following ways:

  • Communities with less than 100 members were dropped

  • Communities that were not active at least monthly, as judged by their last five posts, were dropped. (see post about activecomm.pl here)

  • Edges representing less than 50 common members between two communities were dropped</il>

Any edges pointing to a dropped community vertex, and any community vertex without non-dropped outgoing/incomming edges were also dropped. I created a DOT file representing the remaining community vertices and edges and visualized it with both the dot and neato utilities in the Graphviz package. The dataset included a lot of 2-, 3-, and 4- node subgraphs that I did not think were worth time analyzing, so I removed any subgraph with less than 5 community nodes.

I reran neato to generate a visualization of this final dataset, and tighted the image up (removed a lot of whitespace) with Photoshop. A scaled-down version of the image is behind the lj-cut and the full-size image is available by clicking on it.

Figure 1: Visualization of LiveJournal community space using shared-members linksCollapse )

The visualization shows various communities-of-communities in the LiveJournal community space that are tightly connected by sharing at least 50 members between some of their member communities. The community vertices themselves have a size representative of the number of total members, and are colored based on activity levels. Note that the n in the activity metric is the number of days given to the activecomm.pl script. Coarsely, however, we may say that red communities recieve posts on average once between every month and every week, yellow communities once between every week and every day, and blue communities are posted to daily or more frequently.

One of the strengths of the shared-members visualization lies in its ability to capture and make salient the strength of inter-community ties, and possible paths of information flow between communities (e.g. though cross-posting or linking). Let's examine a small porton of the full visualization more carefully.

Figure 2: Blow-up of a section of Figure 1 showing three communities-of-communities subgraphsCollapse )

Note the five connected, yellow communities in the top-left corner of Figure 2, for instance. The four communities that form a clique in the graph are players in LiveJournal's fake hair user-community (fake hair is popular with individuals identifying themselves with the goth, cybergoth, and nightclub lifestyles). There is a 'market' community where users may sell or buy fake hair and accessories, a 'pix' community where users post pictures of themselves wearing fake hair, and two other fake hair general-interest communities. All of these communities are active at least on a weekly basis (n = 7) and share between 50 and 100 members in each pair, indicating strong inter-community ties and most likely pointing to a high rate of information flow through the communities (e.g. through cross-posting). In the same group of communities, the 'cyberwarez' community is least connected to the rest, hinting at somewhat different set of interests than the rest of the communities in the graph. Indeed, I found the 'cyberwarez' community to be one where users could buy and sell club and party clothing for the goth/cybergoth/nightclub scene - related to, but distinct from the fake hair communities that made up the rest of the graph.

The other two subgraphs present in Figure 2 can be analyzed similarly. The large, mostly-blue subgraph represents a large, active, and somewhat tightly-knit community of buy/sell/exchange communities dedicated to used clothing, accessories, and similar goods. On the fringes of this group of communities are marketplace communities for a less well-defined range of goods ('marketplace', 'trade_stuff', 'subcultauctions') as well as a community specializing in hand-made goods ('__handmade'). The last subgraph seems to show an interesting link between the user-community of music producers and user-community of LiveJournal users interested in aspects of the music industry. The connection seems to be mediated through the link between the 'audioeng' and the 'musicbiz' communities.

Showing mediation between communities-of-communities with similar (or not-so similar) overarching interests seems to be a potent ability of the shared-member representation (see Figure 3).

Figure 3: Example of mediation in LiveJournal community-spaceCollapse )

Note how the ‘europe_history’ and especially the very active ‘middle_ages’ community in Figure 3 seem to mediate the general-interest history community-group (‘askahistorian’, ‘history’, and ‘historystudents’) and the ancient Greco-Roman history community-group (‘roma_antiqua’ and ‘classics’). I fully expect that as the dataset becomes less constrained (e.g. the cut-off value of shared number of members at which edges are dropped is lowered), such mediatory relationships will grow in number and diversity.

In my next post, I will draw some final conclusions about this method of visualizing/representing the LiveJournal community-space, and discuss further paths of research.

12/19/04 09:14 pm - Serious bug in ljspider.pl

Rather unfortunately, it turns out that ljspider.pl has a serious problem with regard to harvesting 'member of' links between communities. Basically, once a community is seen, no further 'member of' links to pointing to it will be logged. This explains why all of the 'member of' graphs I've been generating are trees (every node has a maximum of one edge leading into it).

The fixed code is here: ljspider.pl.

12/13/04 11:54 pm - Another paper for Related Readings

While writing up and looking for references for my LIFE Undergraduate Research Report, I came across this short paper out of Microsoft Research: A Matter of Life or Death: Modeling Blog Mortality by Gina Venolia. The paper tries to fit a simple model of blog "life and death" to some raw data from LJ stats (http://www.livejournal.com/stats/stats.txt). More interestingly, however, the paper provides some nice graphs of LJ new-blog-per-day and new-post-per-day activity (including finding a relatively interesting weekly cycle), so instead of trying to come up with graphs of this data on my own, I'll just cite this from now on. Yay!

Added to Related Readings post too.

12/13/04 01:18 pm - UCHS Exemption

I have just received an email stating that this research has been reviewed by the Cornell UCHS Administrator and is exempt from the federal regulations for the protection of human subjects.

What a fast turn-around!

12/12/04 09:42 pm - Human Subjects Approval and Plan for Paper

I have finally submitted my UCHS (Human Subjects) initial approval form, and hopefully they'll be able to get back to me before the end of the semester. I feel this reserach easily deserves expedited approval, since I'm collecting only publically available data and no attempt is made to extract personally-identifiable data.

I'm going to try and submit a paper about this research to this workshop at the CHI 2005 conference (being held this comming April in Portland, OR). The workshop title is Beyond Threaded Conversation, and so I'd have to bill this research as some sort of conversation-visualization or tracking system... Sounds quite doable, especially considering the last round of analysis and visualizations I've been doing.

Soon I'll be posting graphs and visualizations I made over the last few days. I think they're rather neat, if not too novel/exciting. For now, I'm concentrating on writing a report for the LIFE people (who gave me a reserach grant at the beginning of the summer) and something I can give to Dan/Phoebe so that they can see what happened with this project over the semester/give me a final grade.

12/4/04 09:58 pm - A note on ljspider.pl -- it's a *feature* not a bug (:

It turns out some communities that ljspider.pl thinks it has crawled successfully actually fail because of non-response of the LJ servers. This leaves an empty file as their cache, and thus these communities are easy to single out and rerun. You can put these communities into a new queue_community (saving the old one), delete their _info files (so that the script knows to hit n, run the script until all the unprocessed communities are taken care of, break, and then combine the old queue_community and the new one. Since seen_community remains, no duplicate communities should be listed (same for seen_user).

I'm not going to fix the ljspider.pl code right now, since the work around is rather straightforward and it doesn't seem to happen so often. I may get around to doing that at some point down the line, however.
Powered by LiveJournal.com