The problem is in the content, the playlist does not match the real video, and the mismatch buildup as we progress in the movie for 24.97fps content. 25fps content should be fine.
proof :
capture from <redacted tv show>:
The playlist indicates
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-KEY:METHOD=AES-128,URI="<redacted URL>"
#EXTINF:11,
"<redacted URL>"
#EXTINF:10,
"<redacted URL>"
#EXTINF:10,
"<redacted URL>"
etc...
Now looking at the timestamps at segments start
First segment, look correct, first pts is 0, pts increment of 3754.75 indicates 24000/1000fps
[mp.m2ts.parse.samp.pes.slice] New Slice ******>>>> st 0, et 11000,
[mp.m2ts.parse.samp.pes.orig] ts_sample video valid 3447 bytes, pts 0/90000
[mp.m2ts.parse.samp.pes.orig] ts_sample video valid 18 bytes, pts 3754/90000
[mp.m2ts.parse.samp.pes.orig] ts_sample video valid 18 bytes, pts 7508/90000
2nd segment, according to the playlist, this segment shoud start at 11s, but the first timestamp indicate a time of : 1081081/90000=12.012 sec. Ths is not too bad at this point, but we can foresee a buildup
New Slice ******>>>> st 11000, et 21000, br 2128000, spl 0 size : 320
ts_sample video valid 16354 bytes, pts 1081081/90000
ts_sample video valid 8320 bytes, pts 1084835/90000
ts_sample video valid 7818 bytes, pts 1088589/90000
3rd segment, expected 21se, get 1981982/90000=22.022, gap increasing
New Slice ******>>>> st 21000, et 31000, br 2128000, spl 0 size : 318
ts_sample video valid 16354 bytes, pts 1981982/90000
Now if we jump to a segment at 1h07 into the movie, expected 4021, get 362342342/90000=4026.026, so more than 5sec difference
New Slice ******>>>> st 4021000, et 4031000, br 2128000, spl 0 size : 330
ts_sample video valid 16354 bytes, pts 362342342/90000
ts_sample video valid 6204 bytes, pts 362346096/90000
In our parser, when we detect a difference of more than 5sec between the expected TS and the received TS, we assume there is a discontinuity in the stream and we correct the TS ( We have many streams with discontinuities an must deal with them) , in this case, we will bring it back to match 4021, but this will cause a mismatch between caption and video.
So when seeking to 1h07, we adjust the TS and we see a caption mismatch of ~5sec.However, if we keep playing the content continuously, we will not do the correction and caption will remain in sync
The playlist is used to compute the movie duration and perform accurate seeks, so should be fixed to match the real content