Archive for the ‘NaN’ Category.

CNetS researchers comment on Twitter

Channel 13 video

Channel 13 video

A report on the popularity of Twitter at IU (which ranks among the top 10 universities on a number of metrics) has sparked some interest in the local media about work CNetS researchers are going on Twitter usage. An interview with Filippo Menczer, associate director of CNetS, appeared on the front page of the Herald-Times on Oct 16, 2009. Indianapolis NBC affiliate Channel 13 interviewed Menczer and CNetS postdoc Bruno Gonçalves for their news program that night. The story was also picked up by the Chicago Tribune, US News & World Report, The Republic, Indianapolis Star, NewsDay, Courier-Journal, Indianapolis Business Journal, News-Sentinel, WIBC, The Indy Channel, WHAS, Journal & Courier, Palladium-Item, Star Press, and IDS.

Congratulations Ben and Le-Shin!

ben

Dr. Ben

Le-Shin Wu

Dr. Le-Shin

Congratulations to Ben and Le-Shin — er, Dr. Markines and Dr. Wu! They both successfully defended their dissertations this summer, earning their PhD!

NaN’s strong presence at HT09

ht09NaN had a strong presence at Hypertext 2009 in Torino:

Informatics team finds simple rules that explain universal laws of written text

Similarity Cloud for 'mac' vs 'pc'

Similarity Cloud for 'mac' vs 'pc'

Alessandro Flammini and Filippo Menczer, along with M. Ángeles Serrano from the University of Barcelona, have authored a paper entitled “Modeling Statistical Properties of Written Text” that has been published in the PLoS One. The paper introduces and validates a generative model that explains from simple rules the simultaneous emergence of patterns of written text observed in many languages. The paper focuses on the well-known Zipf’s law of word frequencies, as well as additional patterns such as Heaps’ law of word diversity, the bursty nature of rare words, and similarity among documents. Through their model, the researchers found a connection between word burstiness and the topicality of text. In addition, they identify dynamic word ranking and memory across documents as key mechanisms to explain the organization of written text. The semantic similarity between topics, which is one of the features that the model aims to explain, is visualized by the Similarity Cloud, an online tool developed by computer science graduate student Mark Meiss. The model developed by the researchers and the findings of this paper could lead to improved techniques for identifying key terms that capture the topics of a Web page, which is crucial for matching search queries to relevant results and ads. More…

NaN abstract for Le-Shin’s defense alpha

As adaptive peer network systems becoming an increasingly important development in Web search technology, in this research, an alternative model for peer based Web search is introduced to address the scale problem of centralized search engines. Queries are first matched against the local engine, and then routed to neighbor peers to obtain more results. Initially the network has a random topology (like Gnutella) and queries are routed randomly as in the flood model. However, the protocol includes a learning algorithm by which each peer uses the results of its interactions with its neighbors to refine a model of the other peers. This model is used to dynamically route queries according to the predicted match with other peers’ knowledge. The network topology is thus modified on the fly based on learned contexts and current information needs.

NaN talk for May 5, 2009: Mark Meiss, pre-Alpha Thesis Defense

This week at NaN I’ll be running through a very preliminary version of my thesis talk, “Structural Mining of Large-Scale Behavioral Data from the Internet,” which is all about the things you can discover using network flow data and Web clicks. The big things that I’ll be looking to get as feedback have to do with organization and pruning — I’m not terribly confident of the order in which I present the material, and I have a LOT more things to say than time to say it in, so I can use some suggestions on what to keep and what to just point to the actual document for.

(And, yes, there will be cookies.)

NaN abstract for Alejandro’s Defense Alpha

Online documents provide a rich information resource for aiding the generation of concept-map-based knowledge models, but analyzing resources to select concepts and links is a time consuming task.  This work focuses on harnessing the information in unstructured text documents using text mining algorithms to generate preliminary concept maps automatically.  These maps can be used to assist human users on question answering tasks or automatic document classification.

NaN abstract for Jacob’s April 21st talk

I will talk about some work related to to the problem of predicting the popularity of online content, and some initial results from my experiments in this area. More in detail, I’ll overview work by Leskovec et al and Huberman et al on modeling and predicting growth, then outline the results of two initial experiments.

NaN Abstract for Michael Conover’s April 21st Talk

“The problem with Wikipedia is that it only works in practice. In theory, it can never work.”  — Zeroeth Law of Wikipedia

One of the most important social and intellectual phenomena of the 21st century, the collaboratively-edited online encyclopedia Wikipedia is vexing in its ability to produce informative articles on a multitude of subjects.  Leveraging graph theoretic techniques to measure the degree to which latent connections between articles are present in the Wikipedia corpus we demonstrate that the collaborative editing process produces, over time, an increasingly logically-connected information artifact. Moreover, using the public-domain 1911 Encyclopedia Britannica as a benchmark corpus for the single-author-article paradigm, we demonstrate that Wikipedia contains a growing core of mature articles which exhibit a degree of logical connectedness significantly surpassing that found in the Encyclopedia Britannica. Taken in conjunction with an understanding of Wikipedia’s accuracy and topical coverage, this conclusion paints a rich portrait of the strengths and weaknesses of both collaboratively- and single-author-edited encyclopedias.

Ruj’s Alpha Defense

I’m presenting my pre-defense content of my research.  There will also be cakes!