User Media

The User Media object allows retrieving camera or microphone input from a user, including speech recognition, speech synthesis (also known as text-to-speech) and ambient light readings. This requires appropriate hardware being installed on the user's system, such as a webcam for a PC, a phone camera on a mobile, or a microphone. Camera snapshots can be taken and transferred in to Sprite or Tiled Background objects, and microphone input can be analysed with the Audio object.

For security reasons, most browsers will prompt the user for permission before allowing user media input, and will display clear notifications that the media device is currently being used, such as a recording icon in the system tray or tab icon.

The User Media object has common features, including the ability to have effects applied for video feeds.

For several examples of what the User Media object can do, search for User Media in the Start dialog.

In the layout

The User Media object appears as a rectangle in the layout view, represented by a red cross. This represents where the video feed will be displayed in the layout. If you only need microphone input, place the User Media object outside the layout.

User Media conditions

On ambient light reading update: Triggered when the ambient light reading (the AmbientLux expression) changes. This only happens if the device has an appropriate sensor.
Is canvas recording supported: Returns true if the current browser/platform supports recording a video from the game's canvas with the Start recording canvas action.
Is recording format supported: Check if the current browser/platform supports recording a video with the Start recording canvas action in the given video format.
On canvas recording ready: Triggered after the Stop recording canvas action when the recording is available to be downloaded. Typically this is done by using the Browser object's 'Invoke download' action to download the CanvasRecordingURL expression.
On media request approved: Triggered when the user confirms a security prompt after the Request camera or Request microphone actiona, indicating their approval to allow the application to use media input.
On media request declined: Triggered when the user cancels a security prompt after the Request camera or Request microphone actions, indicating they do not approve the application's request to use media input.
On retrieved media sources: Triggered after the Get media sources action completes, and the list of media sources is available with the AudioSource and CameraSource expressions.
Is recognising speech: True if a speech recognition request has been approved, and speech input through a microphone is actively being recognised.
On speech recognition end: Triggered after the Stop speech recognition action, or after the user stops speaking in Single phrase mode speech recognition.
On speech recognition error: Triggered if there is an error approving speech recognition, or during speech recognition. The SpeechError expression is set to a string which describes the type of problem, e.g. "not-allowed" if permission was declined.
On speech recognition result: Triggered during active speech recognition when the interim or final transcript has changed. Use either the FinalTranscript and/or the InterimTranscript expressions to get the updated result.
On speech recognition start: Triggered after Request speech recognition when the user has also approved any prompt for permission.
Supports speech recognition: True if the current browser or platform supports speech recognition. If false, none of the speech recognition features of the object will work.
Is speaking: True if the speech synthesis engine is currently reading out some text.
Supports speech synthesis: True if the current browser supports speech synthesis, so the Speak text action can work.
Supports user media: True if the current browser supports the User Media object. Not all browsers support the necessary features, so if this is not true it indicates media input is always unavailable. It may also be true even if the user has no media devices installed on their system, since it only determines if the browser has the capability to support media input or not.

User Media actions

Start recording canvas
Stop recording canvas: If canvas video recording is supported (checked with the Is canvas recording supported condition), starts and stops recording a video of the canvas. When starting recording, various format and quality options can be chosen. Note not all record formats may be supported; use the Is recording format supported condition to check. Once the canvas recording is stopped, On canvas recording ready triggers where the recording can be accessed.
Get media sources: Request a list of media sources that can be used with the Request camera or Request microphone actions. For example a mobile device may have both front-facing and back-facing cameras, or multiple microphones. Using the media source list allows the specific camera or microphone input to be selected. This does not complete immediately; the media source list is only available after the On retrieved media sources trigger fires. The browser also may not support listing the media sources, in which case the trigger will never fire.
Request camera: Show a security prompt to the user requesting that they give the application permission to use camera input. Either On media request approved or On media request declined will trigger depending on their decision. If approved, the User Media object in the layout will start displaying a video feed from the user's camera device. The specific camera source to use can be chosen with the Source parameter, if media source listing is supported and a media source list has been requested; otherwise the default camera is used. If the preferred width/height are not zero, the nearest supported resolution that the input device supports will be picked.
Request microphone: Show a security prompt to the user requesting that they give the application permission to use microphone input. On media request approved or On media request declined will trigger depending on their decision. The Audio object must also be in the project, and Advanced audio supported to be true, for this to be useful. A tag is given for the microphone input, and the audio input from the microphone is routed the same way as playing a sound with that tag. This means you can assign effects from the Audio object to the microphone input by adding the effects to the same tag assigned to the microphone. A useful combination is to add an analyser effect then a mute effect to microphone input. This prevents the user hearing their own voice, but allows peak, RMS and spectrum monitoring with the analyser. The specific microphone input to use can be chosen with the Source parameter, if media source listing is supported and a media source list has been requested; otherwise the default microphone input is used.
Snapshot: If the user has approved a camera request and the User Media object is showing a video feed, then snapshots the current frame. The still image is then available from the SnapshotURL expression as a data URI representing the image. The image can be loaded in to a Sprite or Tiled Background object using the Load image from URL action and passing SnapshotURL. This action optionally takes parameters allowing you to specify the compression format, which is useful if you intend to upload or save the image and a smaller file size would be advantageous.
Stop: Ends any active video feed or microphone input. Media input must be requested again before it can be used.
Request speech recognition: If Supports speech recognition is true, initiates speech recognition. Usually a permission prompt will appear asking the user if they want to allow the page to use their microphone input. The user must approve the permission prompt before On speech recognition start triggers. If there is a problem or permission is denied, On speech recognition error is triggered. Language specifies the spoken language to recognise. Use a tag like en for English, en-US for US English, en-GB for British English, and so on. Mode can be continuous, which keeps recognising speech until the page is closed or the Stop speech recognition is used; or single phrase, which recognises speech until the user stops talking, then automatically stops speech recognition and triggers On speech recognition end. Results can be Interim to allow interim (unconfirmed) results which can change, accessed by the InterimTranscript expression; or Final to only allow confirmed final results of speech recognition to be returned which will not change, accessed by the FinalTranscript expression.
Stop speech recognition: If speech recognition is currently active, ends the speech recognition. On speech recognition end will trigger.
Pause speaking
Resume speaking: Pause or resume text being read out by speech synthesis from the Speak text action.
Speak text: Read out some text using speech synthesis (also known as text-to-speech). The language, volume, rate and pitch of the voice that reads out the text can be customised. The Voice URI can be used to select a different kind of voice (e.g. male vs. female) from a list of the supported voices, if any alternatives are available. The list of possible voices can be retrieved using the VoiceCount and VoiceURIAt expressions.
Stop speaking: Stop reading out text from a previous Speak text action. The speech cannot be resumed.

User Media expressions

AmbientLux: The current ambient light reading in lux, or 0 if no appropriate sensor is present. This changes in the On ambient light reading update trigger.
CanvasRecordingURL: After On canvas recording ready triggers, this is the URL to the video recording that was made. Typically this will be downloaded by using the Browser object's Invoke download action to download this URL.
FinalTranscript: If speech recognition is active, returns the final transcript of confirmed results. This does not change, other than to add newly spoken words which have also been confirmed.
InterimTranscript: If speech recognition is active, returns the interim transcript of results. The Request speech recognition action must have specified Interim for the Results parameter. The text of this expression can change, as the speech recognition engine uses the sound input in real-time to refine the results and correct any misinterpreted words. Once the user has spoken far enough for the speech recognition engine to be confident of a final result, the word will disappear from InterimTranscript and be appended to FinalTranscript.
SpeechError: In On speech recognition error, contains a string which identifies the type of error. Possible values are: "no-speech", "aborted", "audio-capture", "network", "not-allowed", "service-not-allowed", "bad-grammar", or "language-not-supported". The most common errors are "not-allowed" if the user declined the permission prompt; "audio-capture" if no microphone is present; or "network" if the speech recognition is implemented by a remote server over the Internet which is currently unavailable.
VoiceCount: Return the number of voices available for use with speech synthesis.
VoiceLangAt(i)
VoiceNameAt(i)
VoiceURIAt(i): Return the language, name, or URI of the voice at the given zero-based index. This can be used to show the user a list of possible voices to choose. To select a different voice, pass the appropriate voice URI to the Speak text action.
AudioSourceCount: After On retrieved media sources triggers, the number of audio sources available.
AudioSourceLabelAt(index): After On retrieved media sources triggers, the label of the audio source at the given index. The label is normally the name of the input or recording device, but it may be empty for security reasons (such as if the user has not yet approved a media request).
CameraSourceCount: After On retrieved media sources triggers, the number of camera sources available.
CameraSourceFacingAt(index): After On retrieved media sources triggers, a string indicating which way a camera source is facing. This can be "user" (the camera is facing the user, such as the front-facing camera on a phone), "environment" (the camera is facing away from the user, such as the back-facing camera on a phone), "left", "right", or empty if unknown or withheld for security reasons.
CameraSourceLabelAt(index): After On retrieved media sources triggers, the label of the camera source at the given index. The label is normally the name of the input device, but it may be empty for security reasons (such as if the user has not yet approved a media request).
SnapshotURL: A data URI representing the snapshotted image after a Snapshot action, otherwise an empty string. The image can be loaded in to a Sprite or Tiled Background object using the Load image from URL action and passing SnapshotURL. Alternatively, the data URI can be sent to a server, saved to disk, downloaded with the Browser object, or anything else you would like to do with it.
VideoWidth
VideoHeight: If a video feed is approved and active, this returns the size in pixels of the feed from the device (which may not be the same size as the object in the layout). If no feed is active then 0 is returned.