Natural Language Processing/Machine Learning


Corvey, William J., Sarah Vieweg, Sudha Verma, Martha Palmer and James H. Martin. (In press.). Foundations of a Multilayer Annotation Framework for Twitter Communications During Crisis Events. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), May 21-27, 2012, Istanbul, Turkey.

In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually collect and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time and safety-critical situations if captured and analyzed efficiently and effectively. We describe a natural language processing component of the EPIC (Empowering the Public with Information in Crisis) Project infrastructure, designed to extract linguistic and behavioral information from tweet text to aid in the task of information integration. The system incorporates linguistic annotation, in the form of Named Entity Tagging, as well as behavioral annotations to capture tweets contributing to situational awareness and analyze the information type of the tweet content. We show classification results and describe future integration of these classifiers in the larger EPIC infrastructure.

Mark, Gloria, Mossaab Bagdouri, Leysia Palen, James H. Martin, Ban Al-Ani, Ken Anderson (2012). Blogs as a Collective War Diary. 2012 ACM Conference on Computer Supported Cooperative Work, Bellevue, WA.

Disaster-related research in human-centered computing has typically focused on the shorter-term, emergency period of a disaster event, whereas effects of some crises are longterm, lasting years. Social media archived on the Internet provides researchers the opportunity to examine societal reactions to a disaster over time. In this paper we examine how blogs written during a protracted conflict might reflect a collective view of the event. The sheer amount of data originating from the Internet about a significant event poses a challenge to researchers; we employ topic modeling and pronoun analysis as methods to analyze such large-scale data. First, we discovered that blog war topics temporally tracked the actual, measurable violence in the society suggesting that blog content can be an indicator of the health or state of the affected population. We also found that people exhibited a collective identity when they blogged about war, as evidenced by a higher use of first person plural pronouns compared to blogging on other topics. Blogging about daily life decreased as violence in the society increased; when violence waned, there was a resurgence of daily life topics, potentially illustrating how a society returns to normalcy.

Starbird, Kate, Grace Muzny and Leysia Palen (2012). Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitters during Mass Disruptions.Proceedings of the Conference on Information Systems for Crisis Response and Management (ISCRAM 2012), Vancouver, BC.

Social media tools, including the microblogging platform Twitter, have been appropriated during mass disruption events by those affected as well as the digitally-convergent crowd. Though tweets sent by those local to an event could be a resource both for responders and those affected, most Twitter activity during mass disruption events is generated by the remote crowd. Tweets from the remote crowd can be seen as noise that must be filtered, but another perspective considers crowd activity as a filtering and recommendation mechanism. This paper tests the hypothesis that crowd behavior can serve as a collaborative filter for identifying people tweeting from the ground during a mass disruption event. We test two models for classifying on-the-ground Twitterers, finding that machine learning techniques using a Support Vector Machine with asymmetric soft margins can be effective in identifying those likely to be on the ground during a mass disruption event.


Verma, Sudha, Sarah Vieweg, Will Corvey, Leysia Palen, Jim Martin, Martha Palmer, Aaron Schram and Ken Anderson. NLP to the Rescue? Extracting “Situational Awareness” Tweets During Mass Emergency. In the Fifth International AAAI Conference on Weblogs and Social Media, 17-21 July 2011, Barcelona, Spain.

In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand- annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.

Bagdouri, Mossaab (2011). Topic modeling as an analysis tool to understand the impact of the Iraq war on the Iraqi blogosphere. University of Colorado at Boulder MS Thesis.


Corvey, W. J., Vieweg, S., Rood, T. and Palmer, M. (2010). Twitter in Mass Emergency: What NLP Techniques can Contribute. In Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media (Los Angeles, California, June 2010), 23–24.

We detail methods for entity span identification and entity class annotation of Twitter communications that take place during times of mass emergency. We present our motivation, method and preliminary results.

Palen, L., Anderson, K. M., Mark, G., Martin, J., Sicker, D., Palmer, M., and Grunwald, D. (2010). A vision for technology-mediated support for public participation and assistance in mass emergencies and disasters. In Proceedings of the 2010 ACM-BCS Visions of Computer Science Conference (Edinburgh, United Kingdom, April 14 – 16, 2010). ACM-BCS Visions of Computer Science. British Computer Society, Swinton, UK, 1-12.

We present a vision of the future of emergency management that better supports inclusion of activities and information from members of the public during disasters and mass emergency events. Such a vision relies on integration of multiple subfields of computer science, and a commitment to an understanding of the domain of application. It supports the hopes of a grid/cyberinfrastructure-enabled future that makes use of social software. However, in contrast to how emergency management is often understood, it aims to push beyond the idea of monitoring on-line activity, and instead focuses on an understudied but critical aspect of mass emergency response—the needs and roles of members of the public. By viewing the citizenry as a powerful, self-organizing, and collectively intelligent force, information and communication technology can play a transformational role in crisis. Critical topics for research and development include an understanding of the quantity and quality of information (and its continuous change) produced through computer-mediated communication during emergencies; mechanisms for ensuring trustworthiness and security of information; mechanisms for aligning informal and formal sources of information; and new applications of information extraction techniques.

Panorama Theme by Themocracy