NLDB (Natural Language for Databases) Conference, June 2002, Stockholm, Sweden

A Multilevel Text Processing Model of Newsgroup Dynamics

Miroslav Martinovic, G. Sampath



 
 
 
 
 
 

TOPIC AREA: Information Retrieval, Text Summarization, NLP Tools and Resources
 
 



ABSTRACT

 
We present a multilevel model of discussions in USENET newsgroups that includes the use of statistical and linguistic methods to obtain lexical, semantic and discourse characteristics of the text. In contrast with document mining, where the amorphous unstructured nature of text makes it difficult to extract and summarize information in useful ways, several constraints make information extraction and summarization in newsgroup discussions more amenable to analysis at different levels.  The model we present here makes use of several characteristics of newsgroup discussions such as posting structure, times of posting, time spans, and length and depth of a thread.  It uses this information to extract higher-level information on subject matter, interest level, topicality, and discussion trends.  Techniques for summarizing and detection of discussion characteristics are mentioned.