UF Informatics Institute
432 Newell Drive, CISE Bldg E252
Gainesville, FL 32611-5585
The ability to automatically distill information running through social media
platforms, is extremely relevant to understand current world-wide events,
human behaviors, and more. Among many things, this automatic distillation
relies on low-level Natural Language Processing (NLP) tasks such as Named
Entity Recognition (NER), that consists in segmenting proper names in
NER is typically treated as a sequence labeling problem, where the input is a
sequence of words and the output of the process is a sequence of predictions.
Sophisticated models based on deep representation learning approaches
show impressive performance on sequence labeling tasks on newswire text.
However, when these same approaches are given social media data, their
performance degrades dramatically.
Social media is difficult to process automatically due to the ever changing
vocabulary, the flexible grammar, and other nuances of social media, such as
mixed language text.
Solorio will present the ongoing efforts to solve the NER task in text from
social media and micro-blogging platforms, including Reddit, YouTube,
Twitter and StackExchange. She will discuss the need to model text at
different levels to achieve robustness to noise, and will motivate our need to
combine a traditional machine learning approach to NER with the muscle of
deep learning architectures. This system is the top performing system at the
recent shared task on Novel and Emerging Entity Recognition hosted by the
Empirical Methods in Natural Language Processing (EMNLP) Workshop on
Noisy User Generated Text 2017.