The obligations and common ground structure of practical
Luis A. Pinedaa, Varinia M. Estradaa, Sergio R. Coriaa, James F. Allenb
a Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS) Universidad Nacional Autónoma de México (UNAM) b Department of Computer Science, University of Rochester In this paper a theory of dialogue structure of task oriented conversations and its associated tagging scheme are
presented. The theory introduces two linguistic structures supporting the dialogue that, following traditional
terminology, we call the obligations and common ground. The theory is illustrated with the detailed analysis of a
transaction. We also describe the empirical work supporting the theory, as well as an evaluation task. The paper is
concluded with a reflection on the relation of the present theory to traditional notions of obligations and grounding,
its relation to a more general theory of discourse and conversation and its potential application to the construction of
spoken natural language systems.
Keywords: Conversational structure, obligations and common ground, dialogue models, dialogue managers.

1. Introduction

pant. For instance, an action directive stated by U (user) creates the obligation on S (system) to In this paper it is postulated that transactions in task perform the specified act, provided that social and oriented conversations or practical dialogues [2] are other contextual conditions hold. The structure of supported by two linguistic structures that we call obligations is defined as the relation between the the obligations and the common ground. These speech acts that state this kind of intentions and the structures are ‘built’ by the speech or dialogue acts speech acts that satisfy them, within conversational performed by the conversational participants, and a transactions. This structure is based on such a strong task oriented transaction is successfully concluded traditions and social conventions that is even when the construction of these two structures comes satisfied in non-cooperative conversations (e.g. [9]). The common ground structure, on the other hand, is The structure of obligations involves the specifi- defined as the relation between the speech acts cation of intentions through the realization of speech through which conversational participants make acts by one conversational participant, and the sure that they share a common set of beliefs and satisfaction of such intentions through linguistic intentions, and understand the utterances performed acts, or perhaps through acts expressed in alter- by their partners as intended [5]. In an idealized native modalities, by the same or the other partici- conversation, every speech act is understood as Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial. Vol 11, No. 36 (2007), pp. 9-17. ISSN: 1137-3601. AEPIA ( Inteligencia Artificial V. 11, Nº 36, 2007 intended as soon as it is performed, and an implicit discussion of the theory, including its relation to common ground is held between participants along previous work on obligations and grounding, and its the whole of the conversation; however, in real potential applications for developing conversational commonly interrupted and needs to be reestablished in order to proceed with the conversation. The common ground can be broken in two main types of 2. The DIME-DAMSL Tagging Scheme
situations: due to a lack of agreement between the The notions of conversational obligations and understanding problem. In the former case a speech grounding have large tradition in philosophy, lin- act is listened well but the hearer fails to agree with guistics, psychology and AI, and have been applied part or all of its content. The latter case is to the definition of dialogue managers [1,9]; exemplified by situations in which the message is however, these structures are not reflected directly not clear due to noise, for instance, and explicit in annotation schemes, like DAMSL [6,3], which speech acts are required to ensure that the has been used for dialogue analysis [1]. DAMSL participants are ‘engaged’ in the conversation. We distinguishes between the communicative status, the also postulate that the understanding plane includes information level and the forward and backward situations in which the referent is not determined looking functions of utterances, but discourse obli- enough due to its ambiguous or vague nature. gations and common ground acts are distributed implicitly in these four main dimensions. In par- Speech acts need to be distinguished from the ticular, utterances expressing obligations, like ac- utterances that express them, and the same utter- tion directives or information requests, are the ance may express more than one speech act, possi- prominent part of the forward looking functions, but bly in different conversational structures. For in- there are also forward looking functions related to stance, an okay may express a commit in the obli- the common ground, like an affirm act introducing gations structure, an accept in the agreement plane, new information, that must be acknowledged by the and an acknowledgment in the understanding plane hearer; conversely, although most explicit tags of the backward looking functions are mainly concerned with grounding, there are also some There are constraints on the relation between speech backward functions, like answers, that belong to the acts; an action directive, for instance, needs to be paired with an action, and an information request with an answer; in the common ground, a hold act In the present investigation we develop on the must be paired with an accept act, when the DAMSL tagging scheme and, on the basis of the assertion that was put on hold is finally agreed upon, analysis of the DIME Corpus1, the DIME-DAMSL and an overt misunderstanding signal, like what did tagging scheme has been introduced [8,10]. In this you say?, must be paired with an utterance that scheme, all four dimensions of DAMSL are supplies the missing information. If a transaction considered, but in addition, the structure of satisfies all the stated constrains it is said that it is obligations and common ground is made explicit; the specification of this structure includes the definition of a set of speech act types, and also the Summarizing, we define the obligations and specification of the constraints that the actual common ground structure as the relation between performance of these acts should satisfy. This speech acts in a conversational transaction, in addi- relation is defined in terms of the ‘charge’ and tion to a number of constraints on such relation. In ‘credit’ import of these acts; for instance, an action the rest of this paper, the specification of a theory of directive creates an obligation’s charge, and this dialogue acts and conversation based on the ex- charge is only credited when the corresponding plicit realization of the obligations and common action is performed later on in the transaction. ground structure is presented. In Section 2 the basic theory and its associated tagging scheme are An action directive charges the common ground too, introduced. In Section 3, the theory is illustrated but this charge is credited immediately as soon as with the analysis of a transaction of a task oriented the act is understood and agreed upon by the conversation. In Section 4 a summary of the em- interlocutor. In accordance with basic accountability pirical work supporting the theory and an evalua- tion exercise are presented. Finally, in Section 5, a 1 Inteligencia Artificial V. 11, Nº 36, 2007
principles, a transaction is balanced when all the strengthened; for instance, when the purpose of an charges made in both the obligations and common utterance is to provide feedback (e.g. acknowl- ground have been credited. The current specification edgments, back-channels, etc.), reinforcing the of the scheme is presented in tables 1 and 2. belief of the speaker that the hearer is engaged; these acts are also normally credited implicitly by the interlocutor through the normal continuation of the dialogue. This level also includes explicit non- participant
understanding signals that the common ground has been lost (e.g. what did you say?) and needs to be Finally, ambiguous or vague acts charge the un- derstanding plane too, but this kind of acts are credited later on when the ambiguity is resolved or the vague reference is fixed. Also, unlike all other Table 1. Balancing relations for obligations
common ground act types, ambiguous and vague charges may be credited by the interlocutor that made the corresponding charge in the first place. In the table 1 Action is the act of pointing to an object, a zone, a path, etc., or the placing, moving or deleting a domain object in the design space. Table 1 also states if a charge is made immediately at the 3. The Transaction’s Structure
time the speech act is performed by the speaker (i.e. I) or whether it is postponed until it is accepted by To illustrate this machinery, the analysis of a typical the hearer (i.e. P). The table also specifies whether transaction of the DIME Corpus is presented in the charge is on the hearer or on the speaker Table 3. The column # stands for utterance number, T for the turn (System or User), and the numbers in the charge and credit columns index the utterance The common ground is defined by agreement acts, that expressed the corresponding speech acts, for the related to the shared set of beliefs agreed along the obligations, agreement and understanding planes dialogue, and by understanding acts, related to the communication channel. In normal conversation, it is assumed that the content of an utterance is The first utterance in this transaction is an offer accepted by the interlocutor by default, and most which creates a charge in the agreement plane, as forward looking obligation speech acts are accepted offers need to be accepted or rejected; through implicitly; however, there are also agreement acts utterance 2, U accepts the offer, crediting the that are expressed by explicit speech acts. We have agreement charge and placing an obligations charge observed two main cases: (1) the common ground on S, as the system has now the obligation to has been broken (e.g. by a referential failure), and perform the promised action; the main intention of needs to be repaired, and (2) the common ground is the transaction is stated in 3 by U; this action reinforced by the explicit realization of speech acts. directive places a charge on S in the obligations The common ground relations are summarized in plane, and this charge is consistent with the offer made by S itself in the initial utterance. The action directive also places a charge in the agreement Agr-action = {accept | accept-part | hold | maybe |
plane, which is explicitly accepted by S in 4. Understanding-Act = {acknowledgment | back-
Utterance 5 is an open option made by S; although channel | repetition | rephrase | complementation | this type of speech act is normally stated through a considered an affirm act, as its purpose is not to Agreement charges are made immediately at the enrich the set of beliefs of the interlocutor (i.e. to time the speech act is produced, and are normally add a proposition in its knowledge base) but simply credited implicitly by the next utterance produced to allow him to choose from a predefined set of by the interlocutor. Understanding dialogue acts possible courses of action; also, the open option may express that the common ground needs to be does not charges the obligation plane, as the inter- Inteligencia Artificial V. 11, Nº 36, 2007 participant
Table 2. Balancing relations for the common ground plane
locutor has no obligation to do anything about it; are aware that the common ground has been broken however, the open option does charge the common and needs to be restored; for this, a problem-solving ground, as it needs to be accepted or rejected either process to resolve the referent of the remaining explicitly or implicitly by the normal flow of the spatial argument is started. In 10, U answers 9 conversation. Next, U determines further the main through an affirm act (i.e. here) at the time a spatial intention through an affirm act in 6, and accepts zone is pointed at (i.e. the zone corresponding to the implicitly the open option; although this utterance far wall). The answer act credits the obligation has, perhaps, an imperative connotation at the sur- plane, but the affirm act needs to be accepted and face level, it is not considered an action directive as makes a new charge to the agreement level. its purpose is to make a choice (supported by a However, the question in 11 expresses that the pointing act) within the context of the main spatial reference needs still to be confirmed transaction’s intention; however, U needs to be sure (there?); accordingly, and in addition to its that S took notice, and the affirm act charges the corresponding charge in the obligations plane, this is common ground; in 7, S accepts explicitly U’s also a check question that puts on hold the affirm act choice, and credits the corresponding charge. in 10. Theanswer in 12 credits this charge, and U resolves the spatial ambiguity that he himself had At this point of the transaction the main intention introduced in 8, crediting the corresponding charge (i.e. to place a stove) and one of its arguments (i.e. in the understanding plane of the common ground what particular stove) have been fully determined, but the second argument, the location where the stove will be placed, is still to be specified. This is Through 13, S accepts the postponed affirm acts in 8 carried out from utterance 8 to 13. In 8 U states the and 10, which were uttered by U, making the desired location through an affirm act, with the corresponding credits to the understanding plane. At corresponding charge in the agreement plane; this point the main intention with its two arguments however, the statement involves a definite has been determined, and S is able to commit to do description (the far wall) which is ambiguous; in the the action requested by U in 3, making the 2-D and 3-D views (of the interface where the corresponding charge on himself in the obligations corpus was collected) there are two walls that can be plane. This concludes the intention specification the referent, depending on the position adopted by the speaker in relation to the working space; for this reason, 8 charges also the understanding plane with The satisfaction of the intention involves a problem- solving process that has the placing of the stove as its goal; this requires pairing the spatial referent The spatial ambiguity is noticed by S and utters the introduced with the pointing action in 10 with a reference position of the stove (e.g. the center or the corresponding charge in the obligations plane; this bottom-left corner), and this involves the use of question is also a hold act that postpones accepting some design preferences and constraints adopted by 8 in the common ground. At this point both U and S the system. Finally, when the plan is decided, the Inteligencia Artificial V. 11, Nº 36, 2007
Common ground
Dialogue Act Types
(originally in Spanish)
Table 3. Analysis of a transaction
actual action is performed and expressed through this question is credited with U’s answer in 16; the graphical modality. This action credits the finally, the graphical act is credited in the common pending offer in 2, the action directive in 3 and the ground with an accept act expressed by 16 too. commit in 13 in the obligations plane. The graphical act makes also an affirm charge in the agreement Table 3 also illustrates that the structure of the level, as U needs to agree with the result of this transaction can be partitioned in two main phases: action. To conclude the transaction, S makes a intention specification and intention satisfaction, and that the kind of speech act is highly dependent corresponding charge in the obligations plane, and on the part of the transaction where it occurs; for Inteligencia Artificial V. 11, Nº 36, 2007 instance, a commit is very likely to occur at the end is reversed when the common ground is broken, and of the intention’s specification phase, but very the acceptance of a speech act is postponed by a unlikely to occur elsewhere. Also, the main hold act. In this situation the obligations charges and intention is very likely to be expressed at the credits are embedded within the line linking the beginning of the transaction, and this is made very charge made by the speech act that was put on hold often in our corpus through declaratives (more than with the speech act that finally accepts it and makes 70%) and interrogatives (about 20%); other the corresponding credit. While in the main utterances appearing within the context of the main transaction cycle the conversation’s initiative is held intention are very unlikely to be action directives, by U, who imposes obligations on S (in this unless they appear in embedded transactions. More particular setting), when the common ground is generally, the structure of the transaction may be broken the initiative is shifted to S who guides the very helpful to interpret direct and indirect speech conversation in order to reestablish the common acts, as the interpretation process may be construed ground; when this is achieved, the conversation’s as finding the most likely speech act type given the place in the transaction’s structure in which the utterance is expressed, taking into account the actual lexical content and syntactic form of the utterance, 4. Tagging Methodology and Evaluation
Also, unlike written language in which sentences are produced as ready-made units, intentions in The presented theory was developed in conjunction with a transcription task. The exercise started from incrementally. Although the main intention is the original DAMSL scheme and its manual [6], and understood through the meaning of the main verb in a team that included up to 15 taggers at a time the interpretation domain and context, working out participated in the initial training phase. Next, one the referents of the verbal complements is produced dialogue was tagged by several people, and the by an incremental problem-solving process, as kappa statistics was used to measure agreement illustrated by the example in Table 3. In this, as in between taggers [4]. The initial agreement scores many transactions in our corpus, the accusative were very low, especially for the common ground argument is resolved first, followed by the speech acts and the backwards dimension. One resolution of other spatial complements. source of confusion was the implicit assumption that utterances express speech acts in a context The transaction in Table 3 does not involve vague independent fashion, as very few constraints expressions (e.g. to the left of the stove) where the between tags are defined in the original DAMSL spatial information needs to be further determined in scheme. In fact, the DAMSL manual provides order to undertake action, which are common in our explicit decision trees for agreement acts, and corpus; however, this kind of expressions require questions in these trees are focused on the function fixing reference through a problem-solving process of a particular utterance, independently of its analogous to the resolution of the ambiguity in the context in the transaction’s structure. present example. More generally, the resolution process for each argument may involve ambiguous The theory presented in this paper evolved as a and vague expressions that break the common reaction to these problems. Dialogues were first ground, and resolving these, fixing reference and thought of as sequences of transactions; also, the restoring the common ground, seem to be three obligations and common ground were made explicit, aspects of the same underlying phenomenon. and the common ground was also explicitly divided in the agreement and understanding planes of Finally, Table 3 also shows that the obligations expression. Then, speech acts were classified structure ‘dominates’ the common ground in the according to these structures. In this exercise, the sense that the dependencies of the latter structure are DAMSL dimensions (i.e. communicative status, embedded within the ones of the former. This is information level and the forward and backward illustrated by the vertical line linking the statement looking functions) were preserved, and the of the main transaction intention with its actual obligations and common ground structures were satisfaction, which ‘encloses’ the lines linking thought of as orthogonal to DAMSL dimensions, charges with their corresponding credits in the enriching the level of structure postulated in the common ground. However this dominance relation original scheme. The relations between speech acts Inteligencia Artificial V. 11, Nº 36, 2007
within each plane of expression were modeled in utterances, and the scheme seems to cover most terms of the charge and credit import of speech act phenomena in a simple and consistent way. The types, and also in relation to the transaction context. video and audio, with the orthographic transcription, In addition, as the DIME corpus is multimodal, tags the charges/credits tagging and the full DIME- for graphical actions and visual interpretations were DAMSL annotation of the 20 dialogues is available In this exercise an Excel format was used to input the tags for all utterances in a dialogue; this format 5. Conclusion and applications
supported the original DAMSL’s tags and dimensions as well as the obligations and common The analysis of speech acts is required in linguistic ground, and the charges and credits relations. The studies of discourse and conversation, and also for format also allowed the semi-automatic computation the construction of natural language conversational of the kappa statistics for transaction boundaries, systems, especially when spoken language is charge and credits relations and the actual speech involved. In the present approach, the analysis of act type tags. Through the exercise a number of speech acts is partitioned in two levels: the level of conventions about the interpretation of speech acts form and the level of content. The level of form is in relation to the context, and also about the use of constituted by the obligations and common ground the tagging tools, were defined and refined. The structures, and this level is defined in terms of the relations and constraints between speech acts in the context of the transaction, and these relations are With the tagging tool at hand, a formal experiment independent of the actual beliefs and intentions involving three tagging teams of three members expressed by the speech acts. In this sense, the each was developed. In this exercise two dialogues obligations and common ground are abstract from the corpus were transcribed, in a sequence of structures and hence linguistic generalizations that tagging rounds; the teams were allowed to comment are independent of content issues and knowledge and discuss coincidences and discrepancies at the end of each tagging cycle and, after a few rounds, kappa statistics converged up to 0.9 for transaction The notion of obligations has a long tradition in boundaries, charge/credits relations and the actual grammar and logic (e.g. deontic logics), and also in DIME-DAMSL tags. This figure suggests that the agreement between taggers above chance is very consequences of obligation statements that depend good. At the moment we have 12 dialogues on syntactic form are studied, but these are not comprising 139 transactions with 1,702 utterances tagged with the latest version of the scheme by two expert taggers (with the exception of charges and computational linguistics views, in turn, obligations credits for ambiguous and vague references, which are commonly defined as goals that arise in we have only explored in a preliminary way). The conversation and have to be dealt with in kappa statistics for this transcription exercise are conjunction, and some times prior, to the task shown in Table 4. In addition, the kappa statistics domain goals of the agent (e.g. [9]). Although in the for transaction boundaries is 0.83. The current present approach obligations are also thought of as results support the case for the theory, and show that goals that an agent ought to accomplish, these are the methodology and tagging scheme are reliable. stated in relation to content independent generic conversational protocols and, in this sense, the present view is orthogonal to traditional notions. The present notion of common ground is also thought of in terms of structural relations, and in this respect it resembles Clark and Schaefer’s [5] Table 4. Kappa stastistics for 1,702 utterances
grounding model; however, unlike this latter model in which an utterance “presents” a piece of information that is “accepted” by the next utterance In addition, one expert tagger has fully tagged 8 additional dialogues and the actual tagged data 2 comprises 20 dialogues, 269 transactions and 3283 Inteligencia Artificial V. 11, Nº 36, 2007 performed by the interlocutor, forming a grounding an intention with its arguments is specified unit, the so-called “contribution”, the present model incrementally, followed by its satisfaction. In postulates that there is a set of speech acts that particular, expressions filling the intention’s perform specific grounding functions when they are argument positions are initially understood through required (avoiding infinite accepting loops). Finally, meaning, but such expressions have referents which the explicit demarcation of the obligations and the need to be resolved, fixed or determined in order to common ground, with the corresponding constraints act. The resolution of each of these arguments and balancing relations, provides for a simple and becomes embedded subproblems that are also general model for the analysis of task oriented solved cooperatively. The resolution of ambiguous or vague spatial referents, in particular, is an incremental problem-solving process that is often The main aim of the present view of dialogue concluded with an explicit ostension, and this structure is the construction of conversational deictic act fixes reference and restores the common systems in practical dialogues, where a dialogue can ground at the same time. In this view, anaphoric and be analyzed as a sequence of task oriented indexical resolution is subsumed in a process that transactions. We hold that typical transactions, in aims to resolve the referents through a problem- turn, can be modeled through dialogue models solving process involving conventional meanings, representing the obligations and common ground knowledge about the conversational domain, the structures; in a finite state graph or a recursive transaction’s structure, and an interaction with the transition network, for instance, states can represent conversational situations and arcs the type of speech acts that relate situations. In this view, navigation through dialogue models depends on the ability to Acknowledgments
identify the speech act types expressed by utterances, taking advantage of the context and, We thank the support of the people of the DIME perhaps, prosodic information [7]. We also hold the group at IIMAS, UNAM; in particular, Patricia hypothesis that this recognition is mainly a bottom- Pérez Pavón and Haydé Castellanos. We also up process; however, when dialogue act types are acknowledge useful comments by the anonymous available, issues of content can be addressed in a reviewers of this paper. The theory and experiment top-down fashion; for instance, when it has been reported in this paper are being developed within the established that an utterance expresses an action context of the DIME project, with partial support of directive, lexical and syntactic information could be used to determine the specific action to be performed by the system. Also, most common ground speech acts are interpreted directly, and this interpretation requires little lexical and syntactic References
In the present approach, issues related to discourse [1] J.F. Allen, D. K. Byron, M. Dzikovska, G. structure, reference resolution, both anaphoric or Ferguson, L. Galescu and A. Stent. Toward indexical, may also be simplified, as top-down Conversational Human-Computer Interaction. AI interpretation processes focus on the resolution of Magazine, 22(4):27–38, Winter, 2001. the arguments of specific instances of speech act [2] J.F. Allen, D. K. Byron, M. Dzikovska, G. types, when the type of the speech act in question is available already. In summary, the present theory is Architecture for a generic dialogue shell. aimed to the construction of conversational systems Natural Language Engineering, 6(3–4):213– in practical dialogues where the complexity of pragmatic inference can be reduced by the incorporation of dialogue models representing the [3] J. F. Allen and M. G. Core. Draft of DAMSL: obligations and common ground structures of typical transactions of the conversational domain. Annotation Scheme. Department of Computer Science, Rochester University, October, 1997. In a more theoretical setting and according to the [4] Jean Carletta. Assessing agreement on present view, a cooperative transaction can be seen as a cooperative problem-solving process in which Inteligencia Artificial V. 11, Nº 36, 2007
[5] H.H. Clark and E.F. Schaefer. Contributing to Discourse. Cognitive Science, 13:259–294, 1989. [6] M.G. Core and J.F. Allen. Coding Dialogs with the DAMSL Annotation Scheme. Department of Computer Science, Rochester University, 1997. [7] S.R. Coria and L.A. Pineda. Predicting dialogue acts from prosodic information. CICLing 2006, Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, pages 355 – 365, 2006. [8] L.A. Pineda, H. Castellanos, S. Coria, V. Estrada, F. López, I. López, I. Meza, I. Moreno, P. transactions in practical dialogues, CICLing 2006, Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, pages 331 – 342, 2006. [9] D.R. Traum and J.F. Allen. Discourse Obligations in Dialogue Processing. In Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94), pages 1-8, June 1994. [10] L.A. Pineda, V.M. Estrada and S.R. Coria. The obligations and common ground structures of task oriented conversation. Proc. of the Fourth Workshop Technology TIL-2006. Brazil, 2006.


