Computational Processing of Multiword Expressions

Lecturer: Prof. Dr. Carlos Ramisch (Univ. de Marseille – França)

Abstract: Multiword expressions are lexical units made up of more than one lexeme, possessing lexical, syntactic, semantic, pragmatic or statistic idiosyncrasies (Baldwin & Kim, 2010). Examples include idiomatic expressions ("engolir um sapo", "quebrar um galho", "um Deus nos acuda"), nominal compounds (washing machine, high-heel shoes), phrases with a supporting verb (take pictures, take a shower), among others. In both Linguistics and Computer Science, the identification and representation of these expressions is a real hassle for researchers, lexicographers and software developers. This workshop is aimed at linguists and computer scientists who wish to know more about the computational processing of Multiword Expressions. In the first part, the theme will be introduced with plenty of examples of applications whereby the identification and treatment of these expressions is crucial in order to yield natural and accurate results. In the second part, the mwetoolkit ( will be introduced, a tool that helps extract and handle lists of expressions from textual corpora. In the third part, the analysis of the expressions extracted and its applications will be discussed. The workshop will be composed of theoretical and practical parts, along with exercises using the mwetoolkit to extract expressions from textual corpora in electronic format.


Corpora compilation for spontaneous speech analysis

Lectures: Profa. Dra. Heliana Mello (UFMG) e Prof. Dr. Tommaso Raso (UFMG)

Abstract: The corpus compilation of spontaneous speech needs some features that theories and technology have made possible and necessary. In this workshop, it will be illustrated: what it means to have a corpus that actually represent spontaneous speech, and not only one or few genres within this universe; what is necessary to study spontaneous speech, and not only a written text that has an oral text source, transcriptions, that is; how we can parse speech in its reference units, which are so different from the sentences of written texts. Therefore, it will be presented: an architecture based on diaphasic variation, the way text-sound alignment allows for the representation of the speech event and does not reduce it to the product of the written text, the great amount of information conveyed exclusively by acoustic signal is absolutely necessary to analyze this language modality, which, in fact, constitutes the only natural modality.


Introduction to Statistics

Lecturer: Prof. Dr. Crysttian A. Paixão (UFSC)

Abstract: Statistics is one of the sciences that can be applied to a myriad of different knowledge branches. It is part of almost every research. In this workshop, a brief introduction will be made concerning statistical techniques by means of practical examples, in order to show a new perspective on metrics, such as the Hypotheses Test, Reliability Interval and Non-parametric Tests, such as Qui-square Test and Fisher Exact Test.


The computational interface El Grial: working with texts in Spanish

Ministrante: Giovanni Parodi (PUC/Valparaíso - Chile)

Abstract: El Grial ( is a computational interface that allows not only morfosyntactic tagging of texts in Spanish, but also queries on the compiled corpora loaded on the site. The texts are classified by genre, discipline and register and the Access to them is allowed by means of these tags. In this workshop, it will be offered a general description of this computational tool, as well as its main functions and components. There will also be practical exercises to explore the potentialities of the corpora as resources for diverse applications, such as comprehension assessment design, corpus building for learners, and writing based on genres.