Demonstrations¶
The overall approach adopted for the SERA showcase consists of demonstrating the operation of the system used in the experimental setting as well as to exemplify the theoretical results and their reflection in the development of a reference architecture. Further, and optionally, the state of speech recognition and its usefulness for similar scenarios can be demonstrated based on the system used in the experimental setting.
This section lists and explains the different available demonstrations in a sequence that reflects their historical development in the project. The first two demonstrate the hardware capabilities and the software used in the field study as described in The set-up. The Interaction Episodes show parts of the interactions that could occur during the field study, while the final demo illustrates a particular aspect of the Reference Architecture developed in the project.
Note
All the programs described below can be found in the programs sub-directory of the showcase. For more details on the technical setup and the requirements for running the different demos, see Preparations.
The system setup in the field study¶
Start the demo with one of the following scripts or commands:
startSERASystemDemo.bat
startSERASystemDemo.sh
java -jar SERASystemDemo.jar
SERASystemDemo is a demonstration of the input and output capabilities of the hardware and software used in the field study. When you start the demo, a single window will show with four distinct areas: one at the top showing the conversation between user and system as seen by the system, and below that one area to the left showing the internal state of the system, an area to the right for simulating the different input modalities that the system allows, and finally one area in the middle that represents the Nabaztag hardware and the state of its ears and lights if a connection has been established.
The different areas of the GUI for the SERA System Demo.
Initially, the system will react to all possible hardware events and, once the conversation has been started by either indicating the presence of a user, i.e. pressing the infrared sensor button (PIR) or by showing the Arr! card, it will react to all known utterances. The codes of available RFID cards and thus of known utterances are:
- “Smiley” icons: smile
, neutral
, frown
(also corresponding to yes, don’t know, and no, respectively). - Numbers: 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60.
- Topic cards: weather, summary, message, system, addtolog.
- Interaction cards: repeat, Arr! (the later one acting as an interrupting “shut up” card)
If an actual Nabaztag hardware has been set up, it can be used to show RFID cards to the system and to trigger Nabaztag-specific events: moving its ears or clicking its button. If the speech server has been started, speech input can be used. The system will recognise all of the above utterances that correspond to RFID cards.
Typically, this demonstration can be used to introduce the hardware and the basic operation of the system setup. Starting the system, pressing the PIR button to indicate that the system detected someone nearby and then showing each RFID-card and, if ASR is used, speaking the corresponding utterances, illustrates the operation of the system in the field. Waiting without responding will lead to encouragement statements by the system.
The left part of the programs window shows the internals of the system. The first element shows the current State consisting of an overall communication state (either SLEEP, ALERT, LISTENING or TALKING) and a script state which might also be empty (indicated by <no state>). The latter specifies the input events that the system currently responds to, its utterances and behaviour, as well as encouragement statements if it does not detect any responses while listening. Below that is a display that lists Memory elements used by these states (e.g. when the last evaluation dialog was completed, if daily routines should be initiated or a counter of how often the system already talked in the current interaction). The Pause Settings relate to the transitions between SLEEP, ALERT, LISTENING, and TALKING. The system will listen for responses and, if there are none, it will encourage responses. The pauses between encouragements and the time needed to “fall asleep” again are governed by these settings (which are lower than the ones used in the actual field study to reduce waiting time during demonstrations). Clock and Diary show the system’s knowledge of the current time and the activities planned by its user respectively. Finally, at the bottom, is an indicator for the activity of the camera used in data collection. A red ring signals that the camera is recording, while a green dot inside the circle will be present for recordings that have been acknowledged by the user and, therefore, will be kept, see How data were collected.
The right part of the interface allows simulation of input events. The top three buttons correspond to the special hardware sensors used in the field study: the infrared sensor (PIR), the hook for house keys, and the button to allow video recordings. Note that these buttons simulate to some degree the behaviour of the actual experiment setting: the PIR sensor does not react for some time once it was activated and the keys button has two states, down and up (indicated by the button background colour). Directly below is an entry field to simulate a message sent by someone remotely (in the experiment, sent by the researchers; note that the availability of new messages is only checked in rather large intervals, so the message will not be relayed immediately). The remaining buttons correspond to the utterances known by the system which also correspond to a physical RFID card as well as the other events that can be detected using the Nabaztag hardware. The latter are forced movement of the rabbit’s ears, and pressing the button on its head either once or twice.
Scripting System¶
Start the demo with one of the following scripts or commands:
startSERAScriptEditor.bat
startSERAScriptEditor.sh
java -jar SERAScriptEditor.jar
The script editor can be used to inspect and also to edit the set of scripts used to control the behaviour of the system in the field study. By default, it will load the complete set of scripts used during the third iteration of the field study. On the left of the window the known states are displayed in a “lazily loading” tree (i.e. the children of states will only be loaded when expanded). Note that the graph of states is most likely not a tree (i.e. strictly hierarchical). Circles will occur and so some children of newly expanded states will be present in another part of the hierarchy already displayed. This fact is indicated by the text (already loaded) appended to the state’s name and this state will only be available in the first spot in the hierarchy.
The top-most state will always be called DefaultState. This state is used to determine reactions to hardware-events, independent of the currently active state.
To the right of the state-tree, the contents of the currently selected state is displayed in three tabs: Execution, Openings, and Finish. Please note that the format for these states was designed with speed and agility of development for this specific application in mind rather than as a general and readable specification format. Every state directly reflects one Java class that is found in a state-set and such a set corresponds directly to a directory in the filesystem. The Openings tab determines the actions performed when this state becomes active. The Finish tab provides a way to inspect the occurrences of this state in the state-graph and a way to explicitly save changes. The most interesting is the Execution tab that specifies the rules used to determine actions and state-transitions based on events and internal constraints. Each line corresponds to one rule that reacts to a pattern (utterances or system events separated by |) if an optional constraint that refers to memory elements is true by saying an (optional) response and performing an action which can set memory elements or change the current state.
Using the Sequence menu, you can load different sequences (script-sets) and you can also execute them which will start the system interface demonstrated in SERASystemDemo. The script-sets can be found in the scriptsets subdirectory:
- basic/: This set is used in SERASystemDemo by default to demonstrate the hardware capabilities of the system as described in the previous section.
- complete/ and editordefault/: These two sets represent the behaviour as it was in the final iteration of the field study. The duplication is simply a measure to prevent unintended changes in the complete set when the editor is demonstrated.
- exerciseDemo/ and feelingsDemo/: These two sets correspond to the two episodes described in the following section.
- memory-complete/: This set was developed as part of the project C4U - Companions für Userinnen in close collaboration with SERA. It extends the system internals with a more complex model of the user that can be used to relate detected activities with planned activities in order to adapt to the user’s actual rather than his or her scheduled behaviour.
Interaction Episodes¶
The following two programs demonstrate typical interaction episodes as they occurred during the field study, including relevant alternatives. Most of the shown interactions were also used in the field study. These demos simplify them and add setup rules to allow a demonstration without actually going through the whole system dialogues that precede these episodes.
Discussing exercise activities¶
Start the demo with one of the following scripts or commands:
startSERAExerciseEpisode.bat
startSERAExerciseEpisode.sh
java -jar SERASystemDemo.jar --scriptsDir=./scriptsets/exerciseDemo/
The only scripts active in this demo are related to adding exercise activity to the log. As the demo starts, all events except two will be ignored (or rather acknowledged as unsupported):
- The key hook: Lifting or depressing the key hook, interpreted as someone leaving or someone coming home, can be used to initiate the interaction. As the key hook starts in the up position, the first press will always be interpreted as someone coming home.
- A double-press of the Nabaztag’s head button: This can be used to add an artificial just-completed event to the user’s diary.
The demo can be used to show the system’s interaction when the user leaves the house or comes home, either when it knows about a planned or completed activity or when it does not. A typical long demonstration could be:
- Press the keys button.
- The system will welcome you home and ask if you had a good time and if you want to add something to your exercise log.
- Answer using “yes”, “no” or the corresponding “smile”, “frown” cards as well as the number cards for adding exercise to the log.
- Once the system goes back to sleep, press the keys button again.
- As the system does not know about any activity, it will ask if you are going out. (Note that if you answer that you are not going out, the system will not initiate the following dialogues, as it knows you did not go out. In order to demonstrate other alternatives, simply restart the demo.)
- Once the system goes back to sleep, double press the Nabaztag button.
- The system will answer that it has added a new activity.
- Press the keys button again.
- The system now asks about your completed activity and asks about adding the amount of activity to the log.
- Once the system goes back to sleep, press the keys button again.
- The system will now acknowledge that you are going out for the known activity.
Of course any interaction path can be tried. Note that the alternative topic cards, summary, message, system and weather will not be recognised by the system in this demo. The following lists one concrete scenario that can be played out using this demo and (nearly) all the interaction possibilities from the perspective of the system. Note that the SERAScriptEditor is another useful way to inspect this script-set.
Possible scenario¶
- After starting the demo:
- Hang up the keys: the user returns home but does not react to the system.
- The system asks: “Did you have a good time?”
- As the user does not react the system goes back to sleep after some seconds.
- Double-press Nabaztag button: create an artificial event “walking the dog”
- Remove the keys: the system informs the user: “From your plan it looks like you’re going to walking the dog. See you soon.”
- The user is actually going to the grocery store so he answers with “no” (“frown”)
- The system asks: “Are you still going out?”
- The user answers “yes”/”smile”
- The system wishes the user a nice time.
- The system goes back to sleep.
- The user returns from the grocery store: hang up the keys
- The system now knows, due to the previous interaction, that the user should be going out with his dog, but that he actually went to the grocery store, so it simply asks: “Did you have a good time?”
- The user answers yes
- The system asks: “Would you like me to add something to your exercise log?”
- The user indicates 30 minutes of exercise by using the appropriate card.
- The system asks to confirm this number and adds “I’ll put that into your log now”.
- The user changes his mind and says 40.
- The system now asks to confirm again: “Sorry, did you want me to add a different amount instead?”
- The user confirms by saying 40 again.
- Now the system confirms the corrected amount with “I’ll put that in your log now.”
- The user was distracted with other things and did not listen to the rabbit and uses the repeat card so he can hear the message again.
- The system repeats: “I just said I’ll put this in your activity log.”
- Finally, the system mentions the total activity of the day: “That makes a total of 40 minutes of exercise so far today.”
- After some time, the system announces that it goes back to sleep: “Unless you’d like to talk about anything else, I’ll talk to you later.”
Interaction steps¶
If there was activity in the diary, or added artificially by pressing the button:
| 1: | The rabbit asks about the action and prompts its addition to the log. The activity can be added as it is defined in the diary (1 and more minutes) or adjusted to the actual situation. The question is: “Did you have a good time ” + description of activity. If the activity has been marked as wrong (the user doing something different that what is stated in the diary) the companion asks “Did you have a good time?” jumping to 2.1 The companion knows default durations of daily activities jumping already to simple addition to log. |
|---|---|
| 1.1: | Adding the activity: (smile/yes/neutral/addtolog) - “I will put that in your log now” |
| 1.1.1: | Confirming the activity: (smile/yes/neutral/addtolog) - add the activity to the log and tell the user how much exercise he has already done today |
| 1.1.1.1: | End of interaction: if the user does not use another RFID card, the rabbit goes to sleep |
| 1.1.1.2: | Add another activity (addtolog/10-60) - start the process again (1.1) |
| 1.1.2: | Request different amount/correction: (no/frown/10-60) - change the original amount to a new one continue with confirming the activity duration (1.1.1), or adjusting it again (1.1.2). |
| 1.2: | Adding no activity: there is no such possibility. We assume in this demo that if the user returned home, so he had to perform some exercise. |
In the second case, the interaction starts with no pre-planned activity for that day:
| 2: | If there was no activity planned for today, the rabbit asks how was the trip and about the duration of the activity performed. As the activity is unknown to the companion, it has to be defined by the user. The only retained value is the duration. |
|---|---|
| 2.1: | Ask the user if there is something that has to be put into the log - “Would you like me to add something to your exercise log?” |
| 2.1.1: | Adding the activity to the log (yes/smile/neutral/10-60/addtolog) - the same as 1.1.1 and 1.1.2 in case of corrections. |
| 2.1.2: | no activity (no,frown) - there is no such possibility in this demo. |
The agent is able to anticipate actions that should happen currently according to the user’s diary. This is shown by removing the keys.
| 3: | If the user is going out and there is a corresponding activity in the diary, ask the user if he is going out to [walk the dog]. |
|---|---|
| 3.1: | Confirm the activity (yes/smile/neutral) - “see you later” - end the dialog |
| 3.2: | Add activity duration to the log (addtolog) - wait until the user returns from the activity and add the correct information afterwards. |
| 3.3: | Another activity (no/frown) - ask “Are you still going out?” |
| 3.3.1: | Not going out (no/frown) - “Ah” - end the dialog and remember that the user was not going out when removing the keys. |
| 3.3.2: | Going out (neutral/smile/yes) - tell the user to have a nice time, mark wrong activity. |
| 4: | If the user is going out and there is no corresponding activity in the diary, ask the user if he is going out. |
|---|---|
| 4.1: | Not going out (no/frown) - end the dialog (the same as 3.3.1). |
| 4.2: | (addtolog) - wait until the user returns (same as 3.2). |
| 4.3: | Confirm going out (smile/yes/neutral) - tell the user to have a good time. |
| 4.3.1: | (addtolog) - jump to 1.1 |
| 4.3.2: | End dialog (any) - end the dialog. |
Wrapping up the day¶
Start the demo with one of the following scripts or commands:
startSERAFeelingsEpisode.bat
startSERAFeelingsEpisode.sh
java -jar SERASystemDemo.jar --scriptsDir=./scriptsets/feelingsDemo/
The only scripts active in this demo are related to asking the user about the performed activity so far (yesterday and today) as well as special scripts to set up specific situations regarding recently performed activity in the system’s memory. As the demo starts, all events except two will be ignored:
- The PIR button: Pressing the PIR button, interpreted as someone being present, can be used to initiate the interaction.
- A double-press of the Nabaztag’s head button: This can be used to start the selection of predefined memory changes to simulate a specific situation regarding the activity planned and performed today and yesterday (see below).
The demo can be used to show the different approaches of the system to ask users about their evaluation of their own activity, depending on if they over- or underperformed relative to the activity that was planned. The initial reaction of the system when someone approaches (PIR) depends on the existence of a previous self-evaluation rating, on the amount of activity done yesterday and today and on the relation between the amount of activity planned for today and the amount actually performed. The different initial reactions based on these conditions are, at the most coarse level:
- No exercise performed today (predefined as demo 1): “My record shows that today has been a day off exercise. How are you feeling?”
- Today’s total was less than planned (predefined as demo 2): “I just wanted to let you know that according to the log, you’ve done a total of $donetoday minutes today. How are you feeling after today’s activity?”
- Today’s total was equal or more than planned (predefined as demo 3): “I just wanted to let you know that according to the log, today, you planned to do $plannedminutes minutes of exercise and you’ve done a total of $donetoday minutes. How are you feeling after today’s activity?”
- Yesterday’s activity has been rated but today’s total was less than planned (predefined as demo 4): “According to the log, you’ve done a total of $donetoday minutes today. How are you feeling after today’s activity?”
- Today’s total was equal or more than planned and yesterday’s performance has already been rated by the user (predefined as demo 5): “I just wanted to remind you that yesterday, you recorded a total of $doneyesterday minutes of exercise and gave yourself $currentrating out of 5 for how you were feeling.”
Interaction steps¶
The best way to inspect the script-set used in this demo is the SERAScriptEditor. A general outline of the possible interaction steps:
| 1: | Ask if the user has time to chat. |
|---|---|
| 1.1: | (no/frown) Talk to the user in 15 minutes - end dialog. |
| 1.2: | Depending on the current status (see above) jump to one of the above mentioned states (on any input) and ask the user how is he feeling. |
| 1.2.1: | No answer or an irrelevant answer (e.g. frown/no/yes/smile/10-60) - explain the rating of feelings. |
| 1.2.2: | A feeling value (1/2/3/4/5) - confirm the addition - “I will put that in your log”. |
| 1.2.2.1: | Wrong value (frown/no/1-5) - try to correct the value and return to (1.2.2). |
| 1.2.2.2: | Correct value or long wait time (yes/smile) - finish the dialog. |
| 1.3: | If there is enough information about yesterday, provide it to the user and continue with rating today’s feelings (1.2). |
Floor Management in SERA’s reference architecture¶
Start the demo with one of the scripts found in the SERAFloorDemo sub-directory of the programs directory in the showcase. Choose the script that is appropriate for your operating system and processor architecture.
Once you start this demo, an interface for simulating input from a Nabaztaq (similar to the one used in SERASystemDemo) together with a number of hierarchical finite state charts pop up. Note that this demo does not communicate with actual Nabaztag hardware. Some of the state charts are minimized; others remain visible on the desktop. You can adjust and scale the size of the frames to ensure that they are visible. There is also a small frame titled “HMI Nabaztaq” usually in the upper left corner that you should not close. User input is simulated by pressing buttons in the companion simulator, although, for this demo, not all buttons are functional. The available buttons are:
- User Present: signals that someone is present for the rabbit by dispatching a PIR event. You can press this whenever the system is sleeping to wake it up.
- Message/Weather: a user request to switch to or start an interaction about this topic.
- okay or yes
and all other buttons (including the repeat button) - denial or no
As you can see there are separate charts that model Turn-Taking, Topic switches and the dialog states within some specific topic. State transitions in these models are synchronized. There are two topics implemented TopicWeather and TopicMessage. These topic dialog models coincide with the dialog scripts in the implementation of the system used in the three iterations for data collection. The other charts are a refinement of the SALTE-model used to manage the transitions between the top level state SLEEP-ALERT-LISTENING-TALKING, motivated by the wish to allow fluent user interruptions and topic switches. In this implementation, the robot knows that it is being interrupted when it has the turn and a user starts speaking. The following video shows the interaction with this demo using the HSM-based reference architecture:
Starting a basic dialog¶
To start the interaction, press “User Present”, and just wait. After a short while, the Nabaztaq takes the initiative and starts the “MessageTopic” dialogue, asking whether you (the user) want to hear the new message. Answer by means of the “yes” or “no” button:
| U: | User Present |
|---|---|
| N: | “Hello, I have a message for you. Would you like to hear it now?” |
| U: | |
| N: | “The message is: you’ve been promoted. Do you like to hear it again?” |
| U: | |
| N: | “The message is: you’ve been promoted. Do you like to hear it again?” |
| U: |
Explanation about what to notice within the state diagrams¶
Initially the companion is sleeping. The only way to change this state is by indicating that the user is present. This leads to the companion raising its ears to show that it is in the “Alert” state. This state also occurs in the “TurnTaking” state diagram. All other diagrams are still in their initial state; for instance, the two topic diagrams “TopicMessage” and “TopicWeather” reside in their “OutTopic” state. In the “Alert” state, either Nabaztaq or the user can take the initiative to start. The d_initTopic and the u_initTopic transitions from the “Alert” state mimic this, and will move the TurnTaking diagram to either the “hasTurn” or “NoTurn” state (regarding the transition labels: d_X indicates that this transition is done when the system makes a decision to do X, while u_Y indicates that the user is observed when doing Y).
After a short while, the Nabaztaq initiates an interaction about the topic “Message”, by moving into the “HasTurn” state.
If there are no (more) new messages (i.e. messages the user has not heard), the companion returns to the “Sleeping” state unless the user initiates another topic (by pressing one of the topic buttons). There is a “TopicStructure” component that keeps track of topic changes. This effectively translates the d_initTopic or u_initTopic transitions into a set of d_startTopicMessage, d_startTopicWeather, u_startTopicMessage, or u_startTopicWeather transitions. These transitions occur also in the two topic related diagrams (“TopicMessage” and “TopicWeather”). The net effect is that one of these topic diagrams will move from “OutTopic” to the “StartTopic” state.
Explanation of the Non-topic/Dialogue state diagrams¶
The diagrams not directly concerned with a specific topic are there to keep track of, for instance, dialogue state. For example, who has the floor? This is important information for interpreting dialogue acts such as the user speaking regarding floor management. The “userInterrupting” diagram models that, depending on whether it is the Nabaztaq’s turn to speak or not, the user speaking action (u_speak) is to be interpreted as a user interruption (u_interrupt) or as ‘nothing special’ (i.e., the “self loop” on the “NotTurn” state).
A similar mechanism can be seen in the “UserLeavingInterpretation” diagram: the interpretation of a ‘long silence’ by the user will cause the current topic to be abandoned when the system is actually in the “InTopic” state. On the other hand, when in the “OutTopic” state, the same “long silence” is interpreted as the user leaving the Nabaztaq alone, and will cause the system to move to a “NoUser/Sleeping” state.
Example dialog with interruption and topic switch¶
A most salient aspect of the demo is that dialogues can be interrupted at any time. There are two aspects here: First, interrupting will cause the Nabaztaq to stop speaking. Second, an interruption can cause a direct transition from one topic, say, the “MessageTopic” dialogue, to some other topic like “WeatherTopic”. Such transitions would be cumbersome to specify in a classical finite state model. In our demo, the use of hierarchy simplifies this. For instance, the u_changeTopic transition within the “TopicMessage” or “TopicWeather” diagrams enables one to switch topic in all “InTopic” states with a single transition, e.g.:
| U: | User Present |
|---|---|
| N: | “Hello, I have a message for you. Would you like to hear it now?” |
| U: | |
| N: | “The message is: you’ve been promoted. do you like to hear it again?” |
| U: | |
| N: | “The message is: you’ve been ...” |
| U: | ! <INTERRUPT> |
| U: | WEATHER |
| N: | “would you like to hear the weather?” |
| U: | |
| N: | “The weather today is sunny, with a nice temperature.” |
Explanation about what to notice within the state diagrams¶
While the Nabaztaq is talking, it is in the “HasTurn” state within the “TurnTaking” diagram and, simultaneously, the “UserInterruptingInterpretation” diagram is in the “Turn” state. (Note that those two states “HasTurn” and “Turn” names can be considered as corresponding to each other). In this “Turn” state, any user action (modeled by u_speak) is interpreted as an interruption (u_interrupt), causing the “TurnTaking” diagram to move to the “Interrupted” state, followed (immediately, so you cannot see it happen) by a move to the “NoTurn” state. (It could also decide to keep the turn, although for this version of the demo, it will never do that.)
Editing hierarchical finite state charts¶
Using a graph-based representation for the behaviour specification, as in this demo, also allows for intuitive editing of the underlying representation with graphical tools. The following video demonstrates the HSM graph editor developed as part of this demo: