Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Similar presentations


Presentation on theme: "Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,"— Presentation transcript:

1 Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A.

2 Abstract The sentence produced by the decoder with the highest recognition probability may not be the best choice for extracting the intended concepts. The more knowledge sources that share in the selection process the better result can be achieved. In late disambiguation approach, many hypotheses are permitted to propagate through the system till there is enough knowledge to select the best one. In this work recognition score, parsing score, dialog expectations and prosody are used for the decision of selecting the best hypotheses. The scaling weights of the combined scores are determined automatically by an optimization procedure.

3 System Architecture I/O Interface Speech Recognizer Parser Dialog MangerSynthesizer Acoustic Model Language Model Grammar Goal Trees Dialog History Prerecorded Speech Units Flights Database Prosodic Utterance Classifier

4 Travel User Departure Route Depart LocArrive LocDepart DateDepart Time Return Route Return Date Return Time Domain Plans Hierarchical goal tree

5 Clarifications: User-initiated subdialogs, usually by questioning, to ask about some feature for a concept related to one of the current plans of the focus stack. Corrections: User-initiated subdialogs with the intention to correct part of an already constructed plan. They usually appear after system-explicit or implicit confirmations. Meta_communications: User-initiated subdialogs that refer to the dialogue itself, such as asking for repetitions or signaling nonundersatnding. Clarifications: User-initiated subdialogs, usually by questioning, to ask about some feature for a concept related to one of the current plans of the focus stack. Corrections: User-initiated subdialogs with the intention to correct part of an already constructed plan. They usually appear after system-explicit or implicit confirmations. Meta_communications: User-initiated subdialogs that refer to the dialogue itself, such as asking for repetitions or signaling nonundersatnding. Discourse Plans

6 User utterance: "I need a flight from Miami to Boston two days after Christmas" Non Fragmented Parse Recognizer output (i): I_need flight from Miami to Boston two days after Christmas Parser output (i): Flight_Constraints:[departloc] ( FROM [Location] ( [city] ( [City_Name] ( MIAMI ) ) ) ) Flight_Constraints:[arriveloc] ( TO [Location] ( [city] ( [City_Name] ( BOSTON ) ) ) ) Flight_Constraints:[Date_Time] ( [Date] ([Date_Relative] ( [date_offset] ( [day_offset] ( [Number] ( TWO ) ) DAYS [_days_after] ( AFTER ) ) ) ) [holiday] ( [holiday_name] ( CHRISTMAS ) ) ) ) Parser Score

7 Recognizer output (j): I_need flight from Miami to Boston two days other Christmas Parser output(j): Flight_Constraints:[departloc] ( FROM [Location] ( [city] ( [City_Name] ( MIAMI ) ) ) ) Flight_Constraints:[arriveloc] ( TO [Location] ( [city] ( [City_Name] ( BOSTON ) ) ) ) ) Flight_Constraints:[Time_Range] ( [Time] ( [Hour] ( TWO ) )) Flight_Constraints:[Date_Time] ( [Date] ( [holiday] ([holiday_name] ( CHRISTMAS ) ) ) ) Fragmented Parse

8 Utterance Type Classification Tree Q/S 0.5/0. 5 F0_dif>15 F0_dif<15 Q 0.67 Q 0.77 S 0.56 S 0.69 S 0.8 Q 0.54 S 0.84 Q 0.6 S 0.63 Q 0.78 S 0.89 Q 0.85 Q 0.52 S 0.75 S 0.56 Q 0.77 Q 0.74 S 0.75 Q 0.85 S 0.85 End_slope>4.07 F0_range>9 Reg_shape=-1 Reg_shape=1Pen_slope>1.59 Reg_shape=1F0_range<7 End_slope>3.51 F0_pen_dif>5 End_slope<4.07 End_slope<2.56 End_slope>2.56 F0_range<9Pen_slope<1.59 F0_range>7 F0_pen_dif<5 Reg_shape=-1 End_slope<3.51

9 Utterance transcription: and what's the fist flight in the morning Recognizer output(1): and with the first flight in the morning Parser output (1): Flight_Reservation:[Flight_Reference]( WITH THE [Earliest](FIRST FLIGHT IN THE [Time_Range]( [Time_spec]( [Period_Of_Day]( MORNING ) ) ) ) ) Recognizer output(2): I'd what's the first flight in the morning Parser output (2): Flight_Reservation:[Request]([Wh_form]( WHAT'S [Flight_Reference](THE [Earliest] ( FIRST FLIGHT IN THE [Time_Range]( [Time_Spec]( [Period_Of_Day]( MORNING ) ) ) ) ) ) ) How Prosody Can Help

10 Weights Computation Least Squares Minimization/Hill Climbing The error function: E =  i ( G i -  j W j S ij ) 2 I : training sample index J : knowledge source index G i : training score, selected manually, for training sample i W j : score weight for knowledge source j S ij : score of knowledge source j for sample i Get minimum error by solving system of k linear equations: -2  i S ik ( G i -  j W j S ij )=0

11 Experimental Results Testing with cumulative errors Testing without cumulative errors Baseline system 65%69% Proposed System 71%78% Table I. Comparison of Systems Performance SourceRecognizerParserDialog context Measure83%59%48% Table 2. Measure for knowledge sources contribution

12 CONCLUSION AND FUTURE WORK Maximize the amount of information passed between system modules, and use all the higher level knowledge to evaluate the different hypothesis. Decisions are made whenever possible and delayed when necessary. Rank parse results according to number of word coverage and information content. Use expectation list generated from current dialog state to select the most appropriate hypothesis. Future work: Use confidence measures from the recognizer output to confirm our selection.


Download ppt "Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,"

Similar presentations


Ads by Google