Executive Summary

MSRA expects a result-oriented project summary (500-1,000 words) that describes goals and results of the project. Content should include the proposed goals, publication details, project team members. Please also cite any special awards or recognition the project has received.

An innovative multi-dimensional model for community structure on the Web 2.0, especially for weblogs. An efficient community identification algorithm based on delta-closure model or the information bottleneck theory. Measurements for the relationship of the complex community structure of weblogs. Observations of significant characteristics of community evolving on the Web 2.0 and efficient evolutionary computation algorithms for the web community. a new method for community detection, which is based on information bottleneck was proposed.

In the research of World Wide Web, we expand Menczer’work and study the impact of content similarity to topological connectivity and clustering features. By introducing two metrics called linkage probability and triangularization probability, We find they are both proportional to polynomial function of the content similarity. The results are also validated in the simulation results and theoretical analysis. It is unambiguous that content similarity is a significant attribute in the topological connectivity and clustering feature, and contents play important roles in the process of Web evolution.

Based on our previous studies, we propose a model which combines vertex connectivity and content similarity in a proportional manner. Analytical solutions indicate that our model exhibits a power-law degree distribution with variable exponent determined by the weight of content similarity. Distribution of content similarity on connected vertex pairs shows content similar web pages trend to be linked together. Simulation results show our model yields remarkably agreements of both degree and content similarity distributions with real network.

We also have studied the community detection in real-world large-scale networks through the evolutionary game theory (EGT). The nodes in the networks choose their strategies, either to cooperate or defect. And then the nodes within the same community will get the same choice. Therefore, the community structure will emerge through evolutionary game process from the random initial state. Our results showed that the EGT can induce higher accuracy of community detection comparing with previous related works. Moreover, we proposed a network model which can generate the community networks with various strength of community structure. Based on that model, we have studied the effect of the community structure strength to the accuracy of detection. The results show that the stronger the community structure, the more accuracy the detection algorithm. Furthermore, we firstly brought Gini coefficient and Pareto exponent to investigate the wealth distribution in the population during the evolution and found a possible way to minimize the inequality, which may be meaningful to understand the network resource allocation and rather enlightening to making public welfare policies.

Basing on long-time track of the blog community evolution, we will build a demonstrated system for community identification and set up a dataset of blog community, which will be release to the public after the project as a test collection for community analysis and blog researches. Finding the core members in virtual communities is a intriguing problem for social network analysis and other related areas,we presented a SA algorithm to solve this problem. We also apply and evaluate this algorithm in a real-world online community, Douban.com. The algorithm show some satisfying results.

In addition.our group participate TREC2007 enterprise track, we designed an expert search system, and there are 17 group participated in the document search task,and our group is in the 2rd place;There are 15 group participated in the expert search task,our running is in the 4th place.