在 Java 中使用 WebRTC 传输视频——端口限制和自定义编解码

引言

在本文中,我们将继续介绍一些对 WebRTC Native Lib 的覆写过程,主要涉及如何限制端口的使用以及如何重写编解码过程。其他在 Java 中使用 WebRTC 的经验均收录于<在 Java 中使用 WebRTC>中,对这个方向感兴趣的同学可以翻阅一下。本文源代码可通过扫描文章下方的公众号获取或付费下载

限制连接端口

回顾一下之前进行端口限制的完成流程,在创建PeerConnectionFactory的时候,我们实例化了一个SocketFactory和一个默认的NetworkManager,随后在创建PeerConnection的时候,我们通过这两个实例创建了一个PortAllocator,并将这个PortAllocator注入到PeerConnection中。整个流程中,真正做端口限制的代码都在SocketFactory中,当然,也用到了PortAllocator的API。这里你可能会有疑问,PortAllocator中不是有接口可以限制端口范围吗,怎么还需要SocketFactory?

1
2
3
std::unique_ptr<cricket::PortAllocator> port_allocator(
new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
port_allocator->SetPortRange(this->min_port, this->max_port); // Port allocator的端口限制API

我当时也是只通过这个API设置了端口,但是我发现它还是会申请限制之外的端口来做一些别的事情,所以最后我直接复写了SocketFactory,将所有非法端口的申请都给禁掉了,此外因为我们的服务器上还有一些不能用的子网IP,我也在SocketFactory中进行了处理,我的实现内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
rtc::AsyncPacketSocket *
rtc::SocketFactoryWrapper::CreateUdpSocket(const rtc::SocketAddress &local_address, uint16_t min_port,
uint16_t max_port) {
// 端口非法判断
if (min_port < this->min_port || max_port > this->max_port) {
WEBRTC_LOG("Create udp socket cancelled, port out of range, expect port range is:" +
std::to_string(this->min_port) + "->" + std::to_string(this->max_port)
+ "parameter port range is: " + std::to_string(min_port) + "->" + std::to_string(max_port),
LogLevel::INFO);
return nullptr;
}
// IP非法判断
if (!local_address.IsPrivateIP() || local_address.HostAsURIString().find(this->white_private_ip_prefix) == 0) {
rtc::AsyncPacketSocket *result = BasicPacketSocketFactory::CreateUdpSocket(local_address, min_port, max_port);
const auto *address = static_cast<const void *>(result);
std::stringstream ss;
ss << address;
WEBRTC_LOG("Create udp socket, min port is:" + std::to_string(min_port) + ", max port is: " +
std::to_string(max_port) + ", result is: " + result->GetLocalAddress().ToString() + "->" +
result->GetRemoteAddress().ToString() + ", new socket address is: " + ss.str(), LogLevel::INFO);

return result;
} else {
WEBRTC_LOG("Create udp socket cancelled, this ip is not in while list:" + local_address.HostAsURIString(),
LogLevel::INFO);
return nullptr;
}
}

自定义视频编码

您可能已经知道了,WebRTC技术默认是使用VP8进行编码的,而普遍的观点是VP8并没有H264好。此外Safari是不支持VP8编码的,所以在与Safari进行通讯的时候WebRTC使用的是OpenH264进行视频编码,而OpenH264效率又没有libx264高,所以我对编码部分的改善主要就集中在:

  1. 替换默认编码方案为H264
  2. 基于FFmpeg使用libx264进行视频编码,并且当宿主机有较好的GPU时我会使用GPU进行加速(h264_nvenc)
  3. 支持运行时修改传输比特率

替换默认编码

替换默认编码方案为H264比较简单,我们只需要复写VideoEncoderFactory的GetSupportedFormats

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Returns a list of supported video formats in order of preference, to use
// for signaling etc.
std::vector<webrtc::SdpVideoFormat> GetSupportedFormats() const override {
return GetAllSupportedFormats();
}

// 这里我设置了只支持H264编码,打包模式为NonInterleaved
std::vector<webrtc::SdpVideoFormat> GetAllSupportedFormats() {
std::vector<webrtc::SdpVideoFormat> supported_codecs;
supported_codecs.emplace_back(CreateH264Format(webrtc::H264::kProfileBaseline, webrtc::H264::kLevel3_1, "1"));
return supported_codecs;
}

webrtc::SdpVideoFormat CreateH264Format(webrtc::H264::Profile profile,
webrtc::H264::Level level,
const std::string &packetization_mode) {
const absl::optional<std::string> profile_string =
webrtc::H264::ProfileLevelIdToString(webrtc::H264::ProfileLevelId(profile, level));
RTC_CHECK(profile_string);
return webrtc::SdpVideoFormat(cricket::kH264CodecName,
{{cricket::kH264FmtpProfileLevelId, *profile_string},
{cricket::kH264FmtpLevelAsymmetryAllowed, "1"},
{cricket::kH264FmtpPacketizationMode, packetization_mode}});
}

实现编码器

然后是基于FFmpeg对VideoEncoder接口的实现,对FFmpeg的使用我主要参考了官方Example。然后简单看看我们需要实现VideoEncoder的什么接口吧:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware_accelerate);

~FFmpegH264EncoderImpl() override;

// |max_payload_size| is ignored.
// The following members of |codec_settings| are used. The rest are ignored.
// - codecType (must be kVideoCodecH264)
// - targetBitrate
// - maxFramerate
// - width
// - height
// 初始化编码器
int32_t InitEncode(const webrtc::VideoCodec *codec_settings,
int32_t number_of_cores,
size_t max_payload_size) override;

// 释放资源
int32_t Release() override;

// 当我们编码完成时,通过该回调上交视频帧
int32_t RegisterEncodeCompleteCallback(
webrtc::EncodedImageCallback *callback) override;

// WebRTC自己的码率控制器,它会根据当前网络情况,修改码率
int32_t SetRateAllocation(const webrtc::VideoBitrateAllocation &bitrate_allocation,
uint32_t framerate) override;

// The result of encoding - an EncodedImage and RTPFragmentationHeader - are
// passed to the encode complete callback.
int32_t Encode(const webrtc::VideoFrame &frame,
const webrtc::CodecSpecificInfo *codec_specific_info,
const std::vector<webrtc::FrameType> *frame_types) override;

在实现这个接口时,参考了WebRTC官方的OpenH264Encoder,需要注意的是WebRTC是能支持Simulcast的,所以这个的编码实例可能会有多个,也就是说一个Stream对应一个编码实例。接下来,我讲逐步讲解我的实现方案,因为这个地方比较复杂。
先介绍一下我这里定义的结构体和成员变量吧:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// 用该结构体保存一个编码实例的所有相关资源
typedef struct {
AVCodec *codec = nullptr; //指向编解码器实例
AVFrame *frame = nullptr; //保存解码之后/编码之前的像素数据
AVCodecContext *context = nullptr; //编解码器上下文,保存编解码器的一些参数设置
AVPacket *pkt = nullptr; //码流包结构,包含编码码流数据
} CodecCtx;

// 编码器实例
std::vector<CodecCtx *> encoders_;
// 编码器参数
std::vector<LayerConfig> configurations_;
// 编码完成后的图片
std::vector<webrtc::EncodedImage> encoded_images_;
// 图片缓存部分
std::vector<std::unique_ptr<uint8_t[]>> encoded_image_buffers_;
// 编码相关配置
webrtc::VideoCodec codec_;
webrtc::H264PacketizationMode packetization_mode_;
size_t max_payload_size_;
int32_t number_of_cores_;
// 编码完成后的回调
webrtc::EncodedImageCallback *encoded_image_callback_;

构造函数部分比较简单,就是保存打包格式,以及申请空间:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FFmpegH264EncoderImpl::FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware)
: packetization_mode_(webrtc::H264PacketizationMode::SingleNalUnit),
max_payload_size_(0),
hardware_accelerate(hardware),
number_of_cores_(0),
encoded_image_callback_(nullptr),
has_reported_init_(false),
has_reported_error_(false) {
RTC_CHECK(cricket::CodecNamesEq(codec.name, cricket::kH264CodecName));
std::string packetization_mode_string;
if (codec.GetParam(cricket::kH264FmtpPacketizationMode,
&packetization_mode_string) &&
packetization_mode_string == "1") {
packetization_mode_ = webrtc::H264PacketizationMode::NonInterleaved;
}
encoded_images_.reserve(webrtc::kMaxSimulcastStreams);
encoded_image_buffers_.reserve(webrtc::kMaxSimulcastStreams);
encoders_.reserve(webrtc::kMaxSimulcastStreams);
configurations_.reserve(webrtc::kMaxSimulcastStreams);
}

然后是非常关键的初始化编码器过程,在这里我先是进行了一个检查,然后对每一个Stream创建相应的编码器实例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
int32_t FFmpegH264EncoderImpl::InitEncode(const webrtc::VideoCodec *inst,
int32_t number_of_cores,
size_t max_payload_size) {
ReportInit();
if (!inst || inst->codecType != webrtc::kVideoCodecH264) {
ReportError();
return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
}
if (inst->maxFramerate == 0) {
ReportError();
return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
}
if (inst->width < 1 || inst->height < 1) {
ReportError();
return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
}

int32_t release_ret = Release();
if (release_ret != WEBRTC_VIDEO_CODEC_OK) {
ReportError();
return release_ret;
}

int number_of_streams = webrtc::SimulcastUtility::NumberOfSimulcastStreams(*inst);
bool doing_simulcast = (number_of_streams > 1);

if (doing_simulcast && (!webrtc::SimulcastUtility::ValidSimulcastResolutions(
*inst, number_of_streams) ||
!webrtc::SimulcastUtility::ValidSimulcastTemporalLayers(
*inst, number_of_streams))) {
return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
}
encoded_images_.resize(static_cast<unsigned long>(number_of_streams));
encoded_image_buffers_.resize(static_cast<unsigned long>(number_of_streams));
encoders_.resize(static_cast<unsigned long>(number_of_streams));
configurations_.resize(static_cast<unsigned long>(number_of_streams));
for (int i = 0; i < number_of_streams; i++) {
encoders_[i] = new CodecCtx();
}
number_of_cores_ = number_of_cores;
max_payload_size_ = max_payload_size;
codec_ = *inst;

// Code expects simulcastStream resolutions to be correct, make sure they are
// filled even when there are no simulcast layers.
if (codec_.numberOfSimulcastStreams == 0) {
codec_.simulcastStream[0].width = codec_.width;
codec_.simulcastStream[0].height = codec_.height;
}

for (int i = 0, idx = number_of_streams - 1; i < number_of_streams;
++i, --idx) {
// Temporal layers still not supported.
if (inst->simulcastStream[i].numberOfTemporalLayers > 1) {
Release();
return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
}


// Set internal settings from codec_settings
configurations_[i].simulcast_idx = idx;
configurations_[i].sending = false;
configurations_[i].width = codec_.simulcastStream[idx].width;
configurations_[i].height = codec_.simulcastStream[idx].height;
configurations_[i].max_frame_rate = static_cast<float>(codec_.maxFramerate);
configurations_[i].frame_dropping_on = codec_.H264()->frameDroppingOn;
configurations_[i].key_frame_interval = codec_.H264()->keyFrameInterval;

// Codec_settings uses kbits/second; encoder uses bits/second.
configurations_[i].max_bps = codec_.maxBitrate * 1000;
configurations_[i].target_bps = codec_.startBitrate * 1000;
if (!OpenEncoder(encoders_[i], configurations_[i])) {
Release();
ReportError();
return WEBRTC_VIDEO_CODEC_ERROR;
}
// Initialize encoded image. Default buffer size: size of unencoded data.
encoded_images_[i]._size =
CalcBufferSize(webrtc::VideoType::kI420, codec_.simulcastStream[idx].width,
codec_.simulcastStream[idx].height);
encoded_images_[i]._buffer = new uint8_t[encoded_images_[i]._size];
encoded_image_buffers_[i].reset(encoded_images_[i]._buffer);
encoded_images_[i]._completeFrame = true;
encoded_images_[i]._encodedWidth = codec_.simulcastStream[idx].width;
encoded_images_[i]._encodedHeight = codec_.simulcastStream[idx].height;
encoded_images_[i]._length = 0;
}

webrtc::SimulcastRateAllocator init_allocator(codec_);
webrtc::BitrateAllocation allocation = init_allocator.GetAllocation(
codec_.startBitrate * 1000, codec_.maxFramerate);
return SetRateAllocation(allocation, codec_.maxFramerate);
}

// OpenEncoder函数是创建编码器的过程,这个函数中有一个隐晦的点是创建AVFrame时一定要记得设置为32内存对齐,这个之前我们在采集图像数据的时候提过
bool FFmpegH264EncoderImpl::OpenEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx, H264Encoder::LayerConfig &config) {
int ret;
/* find the mpeg1 video encoder */
#ifdef WEBRTC_LINUX
if (hardware_accelerate) {
ctx->codec = avcodec_find_encoder_by_name("h264_nvenc");
}
#endif
if (!ctx->codec) {
ctx->codec = avcodec_find_encoder_by_name("libx264");
}
if (!ctx->codec) {
WEBRTC_LOG("Codec not found", ERROR);
return false;
}
WEBRTC_LOG("Open encoder: " + std::string(ctx->codec->name) + ", and generate frame, packet", INFO);

ctx->context = avcodec_alloc_context3(ctx->codec);
if (!ctx->context) {
WEBRTC_LOG("Could not allocate video codec context", ERROR);
return false;
}
config.target_bps = config.max_bps;
SetContext(ctx, config, true);
/* open it */
ret = avcodec_open2(ctx->context, ctx->codec, nullptr);
if (ret < 0) {
WEBRTC_LOG("Could not open codec, error code:" + std::to_string(ret), ERROR);
avcodec_free_context(&(ctx->context));
return false;
}

ctx->frame = av_frame_alloc();
if (!ctx->frame) {
WEBRTC_LOG("Could not allocate video frame", ERROR);
return false;
}
ctx->frame->format = ctx->context->pix_fmt;
ctx->frame->width = ctx->context->width;
ctx->frame->height = ctx->context->height;
ctx->frame->color_range = ctx->context->color_range;
/* the image can be allocated by any means and av_image_alloc() is
* just the most convenient way if av_malloc() is to be used */
ret = av_image_alloc(ctx->frame->data, ctx->frame->linesize, ctx->context->width, ctx->context->height,
ctx->context->pix_fmt, 32);
if (ret < 0) {
WEBRTC_LOG("Could not allocate raw picture buffer", ERROR);
return false;
}
ctx->frame->pts = 1;
ctx->pkt = av_packet_alloc();
return true;
}

// 设置FFmpeg编码器的参数
void FFmpegH264EncoderImpl::SetContext(CodecCtx *ctx, H264Encoder::LayerConfig &config, bool init) {
if (init) {
AVRational rational = {1, 25};
ctx->context->time_base = rational;
ctx->context->max_b_frames = 0;
ctx->context->pix_fmt = AV_PIX_FMT_YUV420P;
ctx->context->codec_type = AVMEDIA_TYPE_VIDEO;
ctx->context->codec_id = AV_CODEC_ID_H264;
ctx->context->gop_size = config.key_frame_interval;
ctx->context->color_range = AVCOL_RANGE_JPEG;
// 设置两个参数让编码过程更快
if (std::string(ctx->codec->name) == "libx264") {
av_opt_set(ctx->context->priv_data, "preset", "ultrafast", 0);
av_opt_set(ctx->context->priv_data, "tune", "zerolatency", 0);
}
av_log_set_level(AV_LOG_ERROR);
WEBRTC_LOG("Init bitrate: " + std::to_string(config.target_bps), INFO);
} else {
WEBRTC_LOG("Change bitrate: " + std::to_string(config.target_bps), INFO);
}
config.key_frame_request = true;
ctx->context->width = config.width;
ctx->context->height = config.height;

ctx->context->bit_rate = config.target_bps * 0.7;
ctx->context->rc_max_rate = config.target_bps * 0.85;
ctx->context->rc_min_rate = config.target_bps * 0.1;
ctx->context->rc_buffer_size = config.target_bps * 2; // buffer_size变化,触发libx264的码率编码,如果不设置这个前几条不生效
#ifdef WEBRTC_LINUX
if (std::string(ctx->codec->name) == "h264_nvenc") { // 使用类似于Java反射的思想,设置h264_nvenc的码率
NvencContext* nvenc_ctx = (NvencContext*)ctx->context->priv_data;
nvenc_ctx->encode_config.rcParams.averageBitRate = ctx->context->bit_rate;
nvenc_ctx->encode_config.rcParams.maxBitRate = ctx->context->rc_max_rate;
return;
}
#endif
}

SetContext中的最后几行,主要是关于如何动态设置编码器码率,这些内容应该是整个编码器设置过程中最硬核的部分了,我正是通过这些来实现libx264以及h264_nvenc的运行时码率控制。
讲完了初始化编码器这一大块内容,让我们来放松一下,先看两个简单的接口,一个是编码回调的注册,一个是WebRTC中码率控制模块的注入,前面提过WebRTC会根据网络情况设置编码的码率。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
int32_t FFmpegH264EncoderImpl::RegisterEncodeCompleteCallback(
webrtc::EncodedImageCallback *callback) {
encoded_image_callback_ = callback;
return WEBRTC_VIDEO_CODEC_OK;
}

int32_t FFmpegH264EncoderImpl::SetRateAllocation(
const webrtc::BitrateAllocation &bitrate,
uint32_t new_framerate) {
if (encoders_.empty())
return WEBRTC_VIDEO_CODEC_UNINITIALIZED;

if (new_framerate < 1)
return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;

if (bitrate.get_sum_bps() == 0) {
// Encoder paused, turn off all encoding.
for (auto &configuration : configurations_)
configuration.SetStreamState(false);
return WEBRTC_VIDEO_CODEC_OK;
}

// At this point, bitrate allocation should already match codec settings.
if (codec_.maxBitrate > 0)
RTC_DCHECK_LE(bitrate.get_sum_kbps(), codec_.maxBitrate);
RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.minBitrate);
if (codec_.numberOfSimulcastStreams > 0)
RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.simulcastStream[0].minBitrate);

codec_.maxFramerate = new_framerate;

size_t stream_idx = encoders_.size() - 1;
for (size_t i = 0; i < encoders_.size(); ++i, --stream_idx) {
// Update layer config.
configurations_[i].target_bps = bitrate.GetSpatialLayerSum(stream_idx);
configurations_[i].max_frame_rate = static_cast<float>(new_framerate);

if (configurations_[i].target_bps) {
configurations_[i].SetStreamState(true);
SetContext(encoders_[i], configurations_[i], false);
} else {
configurations_[i].SetStreamState(false);
}
}

return WEBRTC_VIDEO_CODEC_OK;
}

放松完了,让我们来看看最后一块难啃的骨头吧,没错,就是编码过程了,这块看似简单实则有个大坑。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
int32_t FFmpegH264EncoderImpl::Encode(const webrtc::VideoFrame &input_frame,
const webrtc::CodecSpecificInfo *codec_specific_info,
const std::vector<webrtc::FrameType> *frame_types) {
// 先进行一些常规检查
if (encoders_.empty()) {
ReportError();
return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
}
if (!encoded_image_callback_) {
RTC_LOG(LS_WARNING)
<< "InitEncode() has been called, but a callback function "
<< "has not been set with RegisterEncodeCompleteCallback()";
ReportError();
return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
}

// 获取视频帧
webrtc::I420BufferInterface *frame_buffer = (webrtc::I420BufferInterface *) input_frame.video_frame_buffer().get();
// 检查下一帧是否需要关键帧,一般进行码率变化时,会设定下一帧发送关键帧
bool send_key_frame = false;
for (auto &configuration : configurations_) {
if (configuration.key_frame_request && configuration.sending) {
send_key_frame = true;
break;
}
}
if (!send_key_frame && frame_types) {
for (size_t i = 0; i < frame_types->size() && i < configurations_.size();
++i) {
if ((*frame_types)[i] == webrtc::kVideoFrameKey && configurations_[i].sending) {
send_key_frame = true;
break;
}
}
}

RTC_DCHECK_EQ(configurations_[0].width, frame_buffer->width());
RTC_DCHECK_EQ(configurations_[0].height, frame_buffer->height());

// Encode image for each layer.
for (size_t i = 0; i < encoders_.size(); ++i) {
// EncodeFrame input.
copyFrame(encoders_[i]->frame, frame_buffer);
if (!configurations_[i].sending) {
continue;
}
if (frame_types != nullptr) {
// Skip frame?
if ((*frame_types)[i] == webrtc::kEmptyFrame) {
continue;
}
}
// 控制编码器发送关键帧
if (send_key_frame || encoders_[i]->frame->pts % configurations_[i].key_frame_interval == 0) {
// API doc says ForceIntraFrame(false) does nothing, but calling this
// function forces a key frame regardless of the |bIDR| argument's value.
// (If every frame is a key frame we get lag/delays.)
encoders_[i]->frame->key_frame = 1;
encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_I;
configurations_[i].key_frame_request = false;
} else {
encoders_[i]->frame->key_frame = 0;
encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_P;
}

// Encode!编码过程
int got_output;
int enc_ret;
// 给编码器喂图片
enc_ret = avcodec_send_frame(encoders_[i]->context, encoders_[i]->frame);
if (enc_ret != 0) {
WEBRTC_LOG("FFMPEG send frame failed, returned " + std::to_string(enc_ret), ERROR);
ReportError();
return WEBRTC_VIDEO_CODEC_ERROR;
}
encoders_[i]->frame->pts++;
while (enc_ret >= 0) {
// 从编码器接受视频帧
enc_ret = avcodec_receive_packet(encoders_[i]->context, encoders_[i]->pkt);
if (enc_ret == AVERROR(EAGAIN) || enc_ret == AVERROR_EOF) {
break;
} else if (enc_ret < 0) {
WEBRTC_LOG("FFMPEG receive frame failed, returned " + std::to_string(enc_ret), ERROR);
ReportError();
return WEBRTC_VIDEO_CODEC_ERROR;
}

// 将编码器返回的帧转化为WebRTC需要的帧类型
encoded_images_[i]._encodedWidth = static_cast<uint32_t>(configurations_[i].width);
encoded_images_[i]._encodedHeight = static_cast<uint32_t>(configurations_[i].height);
encoded_images_[i].SetTimestamp(input_frame.timestamp());
encoded_images_[i].ntp_time_ms_ = input_frame.ntp_time_ms();
encoded_images_[i].capture_time_ms_ = input_frame.render_time_ms();
encoded_images_[i].rotation_ = input_frame.rotation();
encoded_images_[i].content_type_ =
(codec_.mode == webrtc::VideoCodecMode::kScreensharing)
? webrtc::VideoContentType::SCREENSHARE
: webrtc::VideoContentType::UNSPECIFIED;
encoded_images_[i].timing_.flags = webrtc::VideoSendTiming::kInvalid;
encoded_images_[i]._frameType = ConvertToVideoFrameType(encoders_[i]->frame);

// Split encoded image up into fragments. This also updates
// |encoded_image_|.
// 这里就是前面提到的大坑,FFmpeg编码出来的视频帧每个NALU之间可能以0001作为头,也会出现以001作为头的情况
// 而WebRTC只识别以0001作为头的NALU
// 所以我接下来要处理一下编码器输出的视频帧,并生成一个RTC报文的头部来描述该帧的数据
webrtc::RTPFragmentationHeader frag_header;
RtpFragmentize(&encoded_images_[i], &encoded_image_buffers_[i], *frame_buffer, encoders_[i]->pkt,
&frag_header);
av_packet_unref(encoders_[i]->pkt);
// Encoder can skip frames to save bandwidth in which case
// |encoded_images_[i]._length| == 0.
if (encoded_images_[i]._length > 0) {
// Parse QP.
h264_bitstream_parser_.ParseBitstream(encoded_images_[i]._buffer,
encoded_images_[i]._length);
h264_bitstream_parser_.GetLastSliceQp(&encoded_images_[i].qp_);

// Deliver encoded image.
webrtc::CodecSpecificInfo codec_specific;
codec_specific.codecType = webrtc::kVideoCodecH264;
codec_specific.codecSpecific.H264.packetization_mode =
packetization_mode_;
codec_specific.codecSpecific.H264.simulcast_idx = static_cast<uint8_t>(configurations_[i].simulcast_idx);
encoded_image_callback_->OnEncodedImage(encoded_images_[i],
&codec_specific, &frag_header);
}
}
}

return WEBRTC_VIDEO_CODEC_OK;
}

下面就是进行NAL转换以及提取RTP头部信息的过程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
// Helper method used by FFmpegH264EncoderImpl::Encode.
// Copies the encoded bytes from |info| to |encoded_image| and updates the
// fragmentation information of |frag_header|. The |encoded_image->_buffer| may
// be deleted and reallocated if a bigger buffer is required.
//
// After OpenH264 encoding, the encoded bytes are stored in |info| spread out
// over a number of layers and "NAL units". Each NAL unit is a fragment starting
// with the four-byte start code {0,0,0,1}. All of this data (including the
// start codes) is copied to the |encoded_image->_buffer| and the |frag_header|
// is updated to point to each fragment, with offsets and lengths set as to
// exclude the start codes.
void FFmpegH264EncoderImpl::RtpFragmentize(webrtc::EncodedImage *encoded_image,
std::unique_ptr<uint8_t[]> *encoded_image_buffer,
const webrtc::VideoFrameBuffer &frame_buffer, AVPacket *packet,
webrtc::RTPFragmentationHeader *frag_header) {
std::list<int> data_start_index;
std::list<int> data_length;
int payload_length = 0;
// 以001 或者 0001 作为开头的情况下,遍历出所有的NAL并记录NALU数据开始的下标和NALU数据长度
for (int i = 2; i < packet->size; i++) {
if (i > 2
&& packet->data[i - 3] == start_code[0]
&& packet->data[i - 2] == start_code[1]
&& packet->data[i - 1] == start_code[2]
&& packet->data[i] == start_code[3]) {
if (!data_start_index.empty()) {
data_length.push_back((i - 3 - data_start_index.back()));
}
data_start_index.push_back(i + 1);
} else if (packet->data[i - 2] == start_code[1] &&
packet->data[i - 1] == start_code[2] &&
packet->data[i] == start_code[3]) {
if (!data_start_index.empty()) {
data_length.push_back((i - 2 - data_start_index.back()));
}
data_start_index.push_back(i + 1);
}
}
if (!data_start_index.empty()) {
data_length.push_back((packet->size - data_start_index.back()));
}

for (auto &it : data_length) {
payload_length += +it;
}
// Calculate minimum buffer size required to hold encoded data.
auto required_size = payload_length + data_start_index.size() * 4;
if (encoded_image->_size < required_size) {
// Increase buffer size. Allocate enough to hold an unencoded image, this
// should be more than enough to hold any encoded data of future frames of
// the same size (avoiding possible future reallocation due to variations in
// required size).
encoded_image->_size = CalcBufferSize(
webrtc::VideoType::kI420, frame_buffer.width(), frame_buffer.height());
if (encoded_image->_size < required_size) {
// Encoded data > unencoded data. Allocate required bytes.
WEBRTC_LOG("Encoding produced more bytes than the original image data! Original bytes: " +
std::to_string(encoded_image->_size) + ", encoded bytes: " + std::to_string(required_size) + ".",
WARNING);
encoded_image->_size = required_size;
}
encoded_image->_buffer = new uint8_t[encoded_image->_size];
encoded_image_buffer->reset(encoded_image->_buffer);
}
// Iterate layers and NAL units, note each NAL unit as a fragment and copy
// the data to |encoded_image->_buffer|.
int index = 0;
encoded_image->_length = 0;
frag_header->VerifyAndAllocateFragmentationHeader(data_start_index.size());
for (auto it_start = data_start_index.begin(), it_length = data_length.begin();
it_start != data_start_index.end(); ++it_start, ++it_length, ++index) {
memcpy(encoded_image->_buffer + encoded_image->_length, start_code, sizeof(start_code));
encoded_image->_length += sizeof(start_code);
frag_header->fragmentationOffset[index] = encoded_image->_length;
memcpy(encoded_image->_buffer + encoded_image->_length, packet->data + *it_start,
static_cast<size_t>(*it_length));
encoded_image->_length += *it_length;
frag_header->fragmentationLength[index] = static_cast<size_t>(*it_length);
}
}

最后,是非常简单的编码器释放的过程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
int32_t FFmpegH264EncoderImpl::Release() {
while (!encoders_.empty()) {
CodecCtx *encoder = encoders_.back();
CloseEncoder(encoder);
encoders_.pop_back();
}
configurations_.clear();
encoded_images_.clear();
encoded_image_buffers_.clear();
return WEBRTC_VIDEO_CODEC_OK;
}

void FFmpegH264EncoderImpl::CloseEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx) {
if (ctx) {
if (ctx->context) {
avcodec_close(ctx->context);
avcodec_free_context(&(ctx->context));
}
if (ctx->frame) {
av_frame_free(&(ctx->frame));
}
if (ctx->pkt) {
av_packet_free(&(ctx->pkt));
}
WEBRTC_LOG("Close encoder context and release context, frame, packet", INFO);
delete ctx;
}
}

至此,我对WebRTC的使用经历就已经介绍完了,希望我的经验能帮到大家。能坚持看完的童鞋,我真的觉得很不容易,我都一度觉得这篇文章写的太冗长,涉及的内容太多了。但是,因为各个部分的内容环环相扣,拆开来描述又怕思路会断。所以是以一条常规使用流程为主,中间依次引入一些我的改动内容,最后以附加项的形式详细介绍我对WebRTC Native APIs的改动。
而且,我也是近期才开始写文章来分享经验,可能比较词穷描述的不是很到位,希望大家海涵。如果哪位童鞋发现我有什么说的不对的地方,希望能留言告诉我,我会尽可能地及时作出处理的。

参考内容

[1] JNI的替代者—使用JNA访问Java外部功能接口
[2] Linux共享对象之编译参数fPIC
[3] Android JNI 使用总结
[4] FFmpeg 仓库

贝克街的流浪猫 wechat
您的打赏将鼓励我继续分享!
  • 本文作者: 贝克街的流浪猫
  • 本文链接: https://www.beikejiedeliulangmao.top/webrtc/use-in-java/port-limit-and-encode/
  • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议,转载请注明出处!
  • 创作声明: 本文基于上述所有参考内容进行创作,其中可能涉及复制、修改或者转换,图片均来自网络,如有侵权请联系我,我会第一时间进行删除。