Difference between revisions of "Automated Speech Recognition"

From Engineered Arts Wiki
Jump to navigation Jump to search
(Google Cloud Speech-to-Text API)
Line 1: Line 1:
<accesscontrol>robot_tech</accesscontrol>
+
[[Category: Features and technology]]
 +
[[Category: SociBot]]
 +
[[Category: RoboThespian]]
 +
__TOC__
 +
==Introduction==
 +
 
 +
Automated speech recognition is the process of turning sound picked up by the robot's microphone into text.  The text may then be passed on to, for example, a chat subsystem which interprets the text as the human side of a conversation with the robot. The robot may reply using its [[Text-to-Speech| text to speech]] capability.
 +
 
 
Speech to text is provided by the [[Tritium Node - Speech Recognition|Speech Recognition]] Tritium node.  The actual recognition is done by a configurable backend, with Google's Cloud Speech-to-Text API as the default (and currently only) option.
 
Speech to text is provided by the [[Tritium Node - Speech Recognition|Speech Recognition]] Tritium node.  The actual recognition is done by a configurable backend, with Google's Cloud Speech-to-Text API as the default (and currently only) option.
  
 
==Google Cloud Speech-to-Text API==
 
==Google Cloud Speech-to-Text API==
  
To use the API the node must provide authentication credentials.   
+
Google's [https://cloud.google.com/speech-to-text/ Cloud Speech-to-Text] is probably the best service available of its kind.  Using highly trained neural networks it reliably converts audio into meaningful text.  Compared to other similar services, it does a good job of transcribing the text correctly.
 +
 
 +
To use the Google API the Tritium node must provide authentication credentials, created using the API management web application.   
  
 
The process to create new credentials is as follows
 
The process to create new credentials is as follows

Revision as of 09:36, 4 June 2018

Introduction

Automated speech recognition is the process of turning sound picked up by the robot's microphone into text. The text may then be passed on to, for example, a chat subsystem which interprets the text as the human side of a conversation with the robot. The robot may reply using its text to speech capability.

Speech to text is provided by the Speech Recognition Tritium node. The actual recognition is done by a configurable backend, with Google's Cloud Speech-to-Text API as the default (and currently only) option.

Google Cloud Speech-to-Text API

Google's Cloud Speech-to-Text is probably the best service available of its kind. Using highly trained neural networks it reliably converts audio into meaningful text. Compared to other similar services, it does a good job of transcribing the text correctly.

To use the Google API the Tritium node must provide authentication credentials, created using the API management web application.

The process to create new credentials is as follows

  1. Create a Google Cloud API project
  2. Generate a private service account key JSON file
  3. Download the JSON file, saving it on the robot as...
/opt/tritium/nodes/speech_recognition/google_application_credentials.json

The Before You Begin section of the Quickstart: Using Client Libraries tutorial covers this process. (You do not need to follow the rest of the tutorial.)