Detecting a Multi-Level Content Similarity from Microblogs based on Community Structures and Named Entities

JETWI News

The new website of JETWI is established. Welcome to submit your manuscripts.

Submissions

Please send your full manuscript to: jetwi@etpub.com

Useful Documents

FAQs

1. How to submit my research paper? What’s the process of publication of my paper?
The journal receives submitted manuscripts via email only. Please submit your research paper in .doc or .pdf format to the submission email: jetwi@etpub.com.
2.Can I submit an abstract?
The journal publishes full research papers. So only full paper submission should be considered for possible publication. Papers with insufficient content may be rejected as well, make sure your paper is sufficient enough to be published...[Read More]

Home > Published Issues > 2011 > Volume 3, No. 1, February 2011 >

Swit Phuvipadawat and Tsuyoshi Murata

Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Japan

Abstract—This paper presents a method for finding the content similarity for microblogs. In particular, we process data from Twitter for a breaking news detection and tracking application. The goal is to find a collection of similar messages. The method gives two levels of collections. In the first level, similarity is defined by TF-IDF. Since contents in microblogs have short lengths, we emphasize on specific terms called named entities. Message groups are obtained in the first level. In the second level, we construct a network from the message groups and named entities and perform a community detection. We evaluate and visualize the community results based on several community detection algorithms. We demonstrate that this method can be used to explore similar messages with results in both tightly and loosely coupled manners.

Index Terms—twitter, topic detection and tracking, information retrieval, network analysis

Cite: Swit Phuvipadawat and Tsuyoshi Murata, "Detecting a Multi-Level Content Similarity from Microblogs based on Community Structures and Named Entities," Journal of Emerging Technologies in Web Intelligence, Vol. 3, No. 1, pp. 11-19, February 2011. doi:10.4304/jetwi.3.1.11-19

jetwi0301_04

Array

Previous paper：Investigating User Behavior in Document Similarity Judgment for Interactive Clustering-based Search Engines
Next paper：An Interactive Tool for Human Active Learning in Constrained Clustering