Android Speech Recognition CS 5390 Mobile Application Development 04/27/2017 Presented by: Marco López
Outline What it is API Demo Resources
What it is Speech recognition is when software uses a microphone in order to detect sounds, and try to map it to words. In Android, speech recognition can be done through built-in service or activity. Requires internet connectivity in order to interpret a different language other than the device’s configured language. Since it is the service accessing the internet, the app being developed does not require to include this INTERNET permission. Google does this speech recognition with machine learning. They have an engine trained with large sample sizes of people speaking. The engine learns which words are used more often and in what context. Since this service uses internet connectivity, it is not recommended to be constantly running, as it will consume large amounts of battery and bandwidth. Interpreter challenges: Dialect – US vs British User enunciation Microphone quality Distance from microphone Environment noise Accents
API Android introduced speech recognition in API level 3 (Cupcake) through RecognizerIntent This intent launches a new activity running the speech recognizing software from which we can retrieve data from. API level 8 (Froyo) introduced SpeechRecognizer This is an Android service. SpeechRecognizer requires RECORD_AUDIO permission These are the two basic speech recognition APIs <uses-permission android:name="android.permission.RECORD_AUDIO"/>
API - RecognizerIntent Intent Action Summary ACTION_RECOGNIZE_SPEECH Starts an activity that will prompt the user for speech and send it through a speech recognizer. ACTION_VOICE_SEARCH_HANDS_FREE Starts an activity that will prompt the user for speech without requiring the user's visual attention or touch input. ACTION_WEB_SEARCH Starts an activity that will prompt the user for speech, send it through a speech recognizer, and either display a web search result or trigger another type of action based on the user's speech.
API – RecognizerIntent (cont.) In order to start Android’s voice recognition an intent must be created: startActivityForResult is called in order to receive something from the activity Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); try { startActivityForResult(intent, REQ_CODE_SPEECH_INPUT); } catch (ActivityNotFoundException a) { Toast.makeText(getApplicationContext(), getString(R.string.speech_not_supported), Toast.LENGTH_SHORT).show(); } EXTRA_LANGUAGE_MODEL Informs the recognizer which speech model to prefer when performing ACTION_RECOGNIZE_SPEECH. The recognizer uses this information to fine tune the results. This extra is required. Activities implementing ACTION_RECOGNIZE_SPEECH may interpret the values as they see fit. LANGUAGE_MODEL_FREE_FORM Use a language model based on free-form speech recognition. This is a value to use for EXTRA_LANGUAGE_MODEL. LANGUAGE_MODEL_WEB_SEARCH Use a language model based on web search terms. This is a value to use for EXTRA_LANGUAGE_MODEL.
API - RecognizerInent (cont.) Different languages are supported when recognizing speech. The desired language must be specified in the intent Language format must be an IETF language tag (as defined by BCP 47), for example "en-US“: It is also possible to set the language to the device’s locale: The text in the invoked activity can be changed by: intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US"); Local has 5 fields: Language Script Country (region) Variant Extensions intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault()); intent.putExtra(RecognizerIntent.EXTRA_PROMPT, getString(R.string.speech_prompt));
API - RecognizerInent (cont.)
API - RecognizerInent (cont.) onActivityResult is called when the activity called by startActivityForResult is done. The speech recognition activity returns an ArrayList of String through the intent. This array contains possible interpretations of what the user said into the microphone. /** * Receiving speech input */ @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { super.onActivityResult(requestCode, resultCode, data); switch (requestCode) { case REQ_CODE_SPEECH_INPUT: { if (resultCode == RESULT_OK && null != data) { ArrayList<String> result = data .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); txtSpeechInput.setText(result.get(0)); } break; } } }
API - SpeechRecognizer Using SpeechRecognizer requires a RecognitionListener. RecognitionListener methods Summary onReadyForSpeech(Bundle) Called when the endpointer is ready for the user to start speaking. onBeginningOfSpeech() The user has started to speak. OnRmsChanged(float) The sound level in the audio stream has changed. onBufferReceived(byte[]) More sound has been received. onEndOfSpeech() Called after the user stops speaking. onError(int) A network or recognition error occurred. onResults(Bundle) Called when recognition results are ready. onPartialResults(Bundle) Called when partial recognition results are available. onEvent(int, Bundle) Reserved for adding future events.
API – SpeechRecognizer (cont.) public void onResults(Bundle results) { String str = new String(); String test = new String(); Log.d(TAG, "onResults " + results); ArrayList data = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); for (int i = 0; i < data.size(); i++) { Log.d(TAG, "result " + data.get(i)); str += data.get(i); if (i == 0) test += data.get(i); } mText.setText("results: " + test); }
API - SpeechRecognizer (cont.) Declare SpeechRecognizer inside activity and create intent the same way as with RecognizerIntent, but this time call startListening on the SpeechRecognizer as opposed to starting a new activity. sr = SpeechRecognizer.createSpeechRecognizer(this); sr.setRecognitionListener(new RecognitionListener() {…}); Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "es-MX"); intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 5); sr.startListening(intent);
Example
Resources https://developer.android.com/reference/java/util/Locale.html https://developer.android.com/reference/android/speech/RecognizerIntent.html https://developer.android.com/reference/android/speech/SpeechRecognizer.html