Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.

Similar presentations


Presentation on theme: " Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors."— Presentation transcript:

1

2

3  Feature extractor

4  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

5  Acoustic Observations

6  Hidden States

7  Acoustic Observations  Hidden States  Acoustic Observation likelihoods

8 “Six”

9

10  Constructs the HMMs of phones  Produces observation likelihoods

11  Constructs the HMMs for units of speech  Produces observation likelihoods  Sampling rate is critical!  WSJ vs. WSJ_8k

12  Constructs the HMMs for units of speech  Produces observation likelihoods  Sampling rate is critical!  WSJ vs. WSJ_8k  TIDIGITS, RM1, AN4, HUB4

13  Word likelihoods

14  ARPA format Example: 1-grams: -3.7839 board-0.1552 -2.5998 bottom-0.3207 -3.7839 bunch-0.2174 2-grams: -0.7782 as the -0.2717 -0.4771 at all 0.0000 -0.7782 at the -0.2915 3-grams: -2.4450 in the lowest -0.5211 in the middle -2.4450 in the on

15 public = ; public = (please | kindly | could you ) *; public = [ please | thanks | thank you ]; = ; = (open | close | delete | move); = [the | a] (window | file | menu);

16  Maps words to phoneme sequences

17  Example from cmudict.06d POULTICE P OW L T AH S POULTICES P OW L T AH S IH Z POULTON P AW L T AH N POULTRY P OW L T R IY POUNCE P AW N S POUNCED P AW N S T POUNCEY P AW N S IY POUNCING P AW N S IH NG POUNCY P UW NG K IY

18  Constructs the search graph of HMMs from:  Acoustic model  Statistical Language model ~or~  Grammar  Dictionary

19

20

21  Can be statically or dynamically constructed

22  FlatLinguist

23  DynamicFlatLinguist

24  FlatLinguist  DynamicFlatLinguist  LexTreeLinguist

25  Maps feature vectors to search graph

26  Searches the graph for the “best fit”

27  P(sequence of feature vectors| word/phone)  aka. P(O|W) -> “how likely is the input to have been generated by the word”

28 F ay ay ay ay v v v v v F f ay ay ay ay v v v v F f f ay ay ay ay v v v F f f f ay ay ay ay v v F f f f ay ay ay ay ay v F f f f f ay ay ay ay v F f f f f f ay ay ay v …

29 Time O1O2O3

30  Uses algorithms to weed out low scoring paths during decoding

31  Words!

32  Most common metric  Measure the # of modifications to transform recognized sentence into reference sentence

33  Reference: “This is a reference sentence.”  Result: “This is neuroscience.”

34  Reference: “This is a reference sentence.”  Result: “This is neuroscience.”  Requires 2 deletions, 1 substitution

35  Reference: “This is a reference sentence.”  Result: “This is neuroscience.”

36  Reference: “This is a reference sentence.”  Result: “This is neuroscience.”  D S D

37

38

39

40

41

42

43

44

45

46

47

48  Limited Vocab Multi-Speaker

49  Extensive Vocab Single Speaker

50 *If you have noisy audio input multiply expected error rate x 2

51 Other variables: -Continuous vs. Isolated -Conversational vs. Read -Dialect

52  Questions?

53 Time O1O2O3

54 Time O1O2O3 P(ay | f) * P(O2|ay) P(f|f) * P(O2 | f)

55 Time O1O2O3 P (O1) * P(ay | f) * P(O2|ay)

56 Time O1O2O3

57  Common Sphinx4 FAQs can be found online: http://cmusphinx.sourceforge.net/sphinx4/do c/Sphinx4-faq.html http://cmusphinx.sourceforge.net/sphinx4/do c/Sphinx4-faq.html  What followes are some less-FAQs

58  Q. Is a search graph created for every recognition result or one for the recognition app?  A. This depends on which Linguist is used. The flat linguist generates the entire search graph and holds it in memory. It is only useful for small vocab recognition tasks. The lexTreeLinguist dynamically generates search states allowing it to handle very large vocabularies

59  Q. How does the Viterbi algorithm save computation over exhaustive search?  A. The Viterbi algorithm saves memory and computation by reusing subproblems already solved within the larger solution. In this way probability calculations which repeat in different paths through the search graph do not get calculated multiple times  Viterbi cost = n 2 – n 3  Exhaustive search cost = 2 n -3 n

60  Q. Does the linguist use a grammar to construct the search graph if it is available?  A. Yes, a grammar graph is created

61  Q. What algorithm does the Pruner use?  A. Sphinx4 uses absolute and relative beam pruning

62  Absolute Beam Width - # active search paths 

63  Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold 

64  Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold   Word Insertion Probability – Word break likelihood 

65  Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold   Word Insertion Probability – Word break likelihood   Language Weight – Boosts language model scores 

66  Silence Insertion Probability – Likelihood of inserting silence 

67  Silence Insertion Probability – Likelihood of inserting silence   Filler Insertion Probability – Likelihood of inserting filler words 

68  To call a Java example from Python: import subprocess subprocess.call(["java", "-mx1000m", "-jar", "/Users/Username/sphinx4/bin/Transcriber.jar”)

69  Speech and Language Processing 2 nd Ed. Daniel Jurafsky and James Martin Pearson, 2009  Artificial Intelligence 6 th Ed. George Luger Addison Wesley, 2009  Sphinx Whitepaper http://cmusphinx.sourceforge.net/sphinx4/#whitep aper http://cmusphinx.sourceforge.net/sphinx4/#whitep aper  Sphinx Forum https://sourceforge.net/projects/cmusphinx/forums


Download ppt " Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors."

Similar presentations


Ads by Google