Video Content Recognition

Last updated: 2024-11-04 10:11:11

Video Content Recognition

Last updated: 2024-11-04 10:11:11

Since August 1, 2022, the audio/video content recognition feature of VOD is paid feature. For more information, see Video Recognition to Become Paid Feature.
Audio/Video content recognition is an offline task that intelligently recognizes audio/video content with the aid of AI. It recognizes faces, text, opening and closing segments, and speech in the video, helping you accurately and efficiently manage your audio/videos. Specifically, it includes the following features:
Feature
Description
Use Cases
Face recognition
Recognizes faces in video images.
Marks where celebrities appear in video images
Checks for particular people in video images
Full speech recognition
Recognizes all words that occur in speech
Generates subtitles for speech content
Performs data analysis on video speech content
Full text recognition
Recognizes all text that occurs in video images
Performs data analysis on text in video images
Speech keyword recognition
Recognizes keywords in speech
Checks for sensitive words in speech
Retrieves specific keywords in speech
Text keyword recognition
Recognizes keywords in video images
Checks for sensitive words in video images
Retrieves specific keywords in video images
Opening and closing segment recognition
Recognizes opening and closing segments in videos
Marks the positions of the opening segment, and closing segment, and feature presentation in the progress bar
Removes the opening and closing segments of multiple videos at a time
Speech translation recognition
Recognizes all words that occur in speech and translates them into the specified language.
Generates translated subtitles for short dramas.
Generates multilingual subtitles for recorded files of cross-border audio and video conferences.
Some content recognition features depend on a material library. There are two types of libraries: public library and custom library.
Public library: VOD's preset material library.
Custom library: Your own library
Recognition Type
Public Library
Custom Library
Face recognition
Supported. The library includes celebrities in the sports and entertainment industries, as well as other people.
Supported. You can use a server API to manage the custom face library.
Speech recognition
Not supported yet
Supported. You can use a server API to manage the custom keyword library.
Text recognition
Not supported yet
Supported. You can use a server API to manage the custom keyword library.
Audio/Video Content Recognition Template
Audio/Video content recognition integrates a number of recognition features. You can use parameters to control the following:
Which content recognition features to enable
Whether to use the public library or custom library for face recognition
The confidence score threshold to return face recognition results
The labels of the faces to return
VOD provides preset video content recognition templates for common parameter combinations. You can also use a server API to create and manage custom templates.
Initiating a Task
You can initiate an audio/video recognition task by calling a server API, via the console, or by specifying the task when uploading videos. For details, see Task Initiation.
Below are the details:
Initiate a task by calling a server API: Call ProcessMedia, setting Definition in the request parameter AiRecognitionTask to the ID of the audio/video content recognition template.
Initiate a task via the console: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask), and use it to process videos in the console.
Specify a task when uploading videos from the server: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When calling ApplyUpload, set the parameter procedure to the task flow.
Specify a task when uploading videos from a client: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When generating a signature for upload, set the parameter procedure to the task flow.
Specify a task when uploading videos via the console: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When uploading videos via the console, select Auto-processing after upload and choose the task flow.
Getting the Result
After initiating an audio/video content recognition task, you can wait for the result notification asynchronously or perform a task query synchronously to get the task execution result. Below is an example of getting the result notification in normal callback mode after a content recognition task is initiated (the fields with null value are omitted):
{
    "EventType":"ProcedureStateChanged",
    "ProcedureStateChangeEvent":{
        "TaskId":"1400155958-Procedure-2e1af2456351812be963e309cc133403t0",
        "Status":"FINISH",
        "FileId":"5285890784363430543",
        "FileName":"Collection",
        "FileUrl":"http://1400155958.vod2.myqcloud.com/xxx/xxx/aHjWUx5Xo1EA.mp4",
        "MetaData":{
            "AudioDuration":243,
            "AudioStreamSet":[
                {
                    "Bitrate":125599,
                    "Codec":"aac",
                    "SamplingRate":48000
                }
            ],
            "Bitrate":1459299,
            "Container":"mov,mp4,m4a,3gp,3g2,mj2",
            "Duration":243,
            "Height":1080,
            "Rotate":0,
            "Size":44583593,
            "VideoDuration":243,
            "VideoStreamSet":[
                {
                    "Bitrate":1333700,
                    "Codec":"h264",
                    "Fps":29,
                    "Height":1080,
                    "Width":1920
                }
            ],
            "Width":1920
        },
        "AiRecognitionResultSet":[
            {
                "Type":"FaceRecognition",
                "FaceRecognitionTask":{
                    "Status":"SUCCESS",
                    "ErrCode":0,
                    "Message":"",
                    "Input":{
                        "Definition":10
                    },
                    "Output":{
                        "ResultSet":[
                            {
                                "Id":183213,
                                "Type":"Default",
                                "Name":"John Smith",
                                "SegmentSet":[
                                    {
                                        "StartTimeOffset":10,
                                        "EndTimeOffset":12,
                                        "Confidence":97,
                                        "AreaCoordSet":[
                                            830,
                                            783,
                                            1030,
                                            599
                                        ]
                                    },
                                    {
                                        "StartTimeOffset":12,
                                        "EndTimeOffset":14,
                                        "Confidence":97,
                                        "AreaCoordSet":[
                                            844,
                                            791,
                                            1040,
                                            614
                                        ]
                                    }
                                ]
                            },
                            {
                                "Id":236099,
                                "Type":"Default",
                                "Name":"Jane Smith",
                                "SegmentSet":[
                                    {
                                        "StartTimeOffset":120,
                                        "EndTimeOffset":122,
                                        "Confidence":96,
                                        "AreaCoordSet":[
                                            579,
                                            903,
                                            812,
                                            730
                                        ]
                                    }
                                ]
                            }
                        ]
                    }
                }
            }
        ],
        "TasksPriority":0,
        "TasksNotifyMode":""
    }
}
﻿
In the callback result, ProcedureStateChangeEvent.AiRecognitionResultSet contains the result of face recognition (Type is FaceRecognition).
According to the content of Output.ResultSet, two people are recognized: John Smith and Jane Smith. SegmentSet indicates when (from StartTimeOffset to EndTimeOffset) and where (coordinates specified by AreaCoordSet) the two people appear in the video.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Feedback

Feature	Description	Use Cases
Face recognition	Recognizes faces in video images.	Marks where celebrities appear in video images Checks for particular people in video images
Full speech recognition	Recognizes all words that occur in speech	Generates subtitles for speech content Performs data analysis on video speech content
Full text recognition	Recognizes all text that occurs in video images	Performs data analysis on text in video images
Speech keyword recognition	Recognizes keywords in speech	Checks for sensitive words in speech Retrieves specific keywords in speech
Text keyword recognition	Recognizes keywords in video images	Checks for sensitive words in video images Retrieves specific keywords in video images
Opening and closing segment recognition	Recognizes opening and closing segments in videos	Marks the positions of the opening segment, and closing segment, and feature presentation in the progress bar Removes the opening and closing segments of multiple videos at a time
Speech translation recognition	Recognizes all words that occur in speech and translates them into the specified language.	Generates translated subtitles for short dramas. Generates multilingual subtitles for recorded files of cross-border audio and video conferences.

Recognition Type	Public Library	Custom Library
Face recognition	Supported. The library includes celebrities in the sports and entertainment industries, as well as other people.	Supported. You can use a server API to manage the custom face library.
Speech recognition	Not supported yet	Supported. You can use a server API to manage the custom keyword library.
Text recognition	Not supported yet	Supported. You can use a server API to manage the custom keyword library.

tencent cloud

Audio/Video Content Recognition Template

Initiating a Task

Getting the Result