Create long-form audio
This document walks you through the process of synthesizing long-form audio. Long Audio Synthesis asynchronously synthesizes up to 1 million bytes on input. To learn more about the fundamental concepts in Text-to-Speech, read Text-to-Speech Basics.
Before you begin
Before you can send a request to the Text-to-Speech API, you must have completed the following actions. See the before you begin page for details.
- Enable Text-to-Speech on a GCP project.
- Make sure billing is enabled for Text-to-Speech.
- Make sure you have the following Identity and Access Management (IAM) roles on the output GCS bucket.
- Storage Object Creator
- Storage Object Viewer
-
After installing the Google Cloud CLI, configure the gcloud CLI to use your federated identity and then initialize it by running the following command:
gcloud init
Synthesize long audio from text using the command line
You can convert long-form text to audio by making an HTTP POST request to the
https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio
endpoint.
In the body of your POST command, specify the following fields.
• voice
: The type of voice to synthesize.
• input.text
: The text to synthesize.
• audioConfig
: The type of audio to create.
• output_gcs_uri
: The GCS output file path under the form of "gs://bucket_name/file_name.wav".
• parent
: The parent under the form "projects/{YOUR PROJECT NUMBER}/locations/{YOUR PROJECT LOCATION}".
The input can contain up to 1MB of characters, the exact limit can vary from different inputs.
Create a Google Cloud Storage bucket under the project that is used to run the synthesis. Make sure the service account used to run the synthesis has read/write access to the output GCS bucket.
Execute the REST request below at the command line to synthesize audio from text using Text-to-Speech. The command uses the
gcloud auth application-default print-access-token
command to retrieve an authorization token for the request.Make sure that the service account running the GET operation has the Text-to-Speech Editor role.
HTTP method and URL:
POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio
Request JSON body:
{ "parent": "projects/12345/locations/global", "audio_config":{ "audio_encoding":"LINEAR16" }, "input":{ "text":"hello" }, "voice":{ "language_code":"en-us", "name":"en-us-Standard-A" }, "output_gcs_uri": "gs://bucket_name/file_name.wav" }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "23456", "metadata": { "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata", "progressPercentage": 0, "startTime": "2022-12-20T00:46:56.296191037Z", "lastUpdateTime": "2022-12-20T00:46:56.296191037Z" }, "done": false }
The JSON output for the REST command contains the long operation name in the
name
field. Execute the REST request below at the command line to query the state of the long running operation.Make sure that the service account running the GET operation is from the same project as the one used for synthesis.
HTTP method and URL:
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/12345/locations/global/operations/23456", "metadata": { "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata", "progressPercentage": 100 }, "done": true }
Query the list of all operations running under a given project, execute the REST request below.
Make sure that the service account running the LIST operation is from the same project as the one used for synthesis.
HTTP method and URL:
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "operations": [ { "name": "12345", "done": false }, { "name": "23456", "done": false } ], "nextPageToken": "" }
Once the long running operation successfully completes, find the output audio file in the given bucket uri in the
output_gcs_uri
field. If the operation did not complete successfully, find the error by querying using the GET REST command, correct the error, and issue the RPC again.
Synthesize long audio from text using client libraries
Install the client library
Python
Before installing the library, make sure you've prepared your environment for Python development.
pip install --upgrade google-cloud-texttospeech
Create audio data
You can use Text-to-Speech to create a long audio file of synthetic human speech. Use the following code to create a long audio file in your GCS bucket.
Python
Before running the example, make sure you've prepared your environment for Python development.
Clean up
To avoid unnecessary Google Cloud Platform charges, use the Google Cloud console to delete your project if you do not need it.
What's next
- Learn more about Cloud Text-to-Speech by reading the basics.
- Review the list of available voices you can use for synthetic speech.