Decoding TV Teddy – Part Two: Programming and Audio Output
As a purely fun and academic exercise, I’m going attempt to decode the TV Teddy audio track embedded in a TV Teddy video programme, and output the audio as a separate file. I’ll then try and play back both the audio file and the YouTube video in sync to enjoy this particular TV Teddy episode with full dialogue for the first time.
I’ll be using the Python 3.7 programming language for this project so I’ve started a new project in my favourite Python development environment (Jetbrains PyCharm) which has a free community edition as well a commercial edition.
I’ll also need to download the OpenCV video file library (which can read MP4 format video files) using:
pip install opencv-python
Next, I capture (using Quicktime on my Mac) the first couple of minutes of video from a TV Teddy show on Youtube. I’ve chosen this one:
I’ve renamed the file to make it easier to access and created a folder called ‘media’ in the project, saving the file there as ‘media/TVTeddyCapture.mp4’.
To get started I’ve used the working Python OpenCV coding example at: https://stackoverflow.com/questions/33311153/python-extracting-and-saving-video-frames and adapted it thus:
import cv2 print('TV Teddy Frame Extractor - CV2 version = ', cv2.__version__) vidcap = cv2.VideoCapture('media/TVTeddyExcerpt.mp4') success, image = vidcap.read() count = 0 success = True while success: success,image = vidcap.read() count += 1 if count % 1000 == 0: print('count of frames so far =', count) cv2.imwrite("media/frame%d.jpg" % count, image) # save frame as JPEG file print('total count of frames =', count)
In the above code, I am simply reading the video file and occasionally printing out frames once every count of 1,000 frames.
I ran the code for a few seconds and made it save a few frames to make sure it was working. Frame 1000 looks like this:
From this frame I note:
This video is in 640 x 476 format.
Looking at the embedded audio track in an image editor, I found that its centre is 5 pixels in from the left.
The top of the grey soundtrack starts at vertical pixel 2, and the bottom ends at vertical pixel 472 (beyond those extremes appear to be distortions at the start and end of the frame, probably caused by the VHS player switching between its rotary heads as it reads the tape).
So my plan is to:
Read each frame, and do this:
Set a loop running from value 16 to value 700 and, at each position (5, 2) through to (5, 472) do this:
Read the RGB value of the pixels and add those three values together together, subtracting 127 from each so that the waveform centres on value 0 and not on value 127. The values will be sampled between 0 and 255 so subtracting 127 will make them become sampled between 128 and -127. ‘Silence’ (denoted by grey rather than black or white) will then be recorded at 0 and not at 127. Summing the three R, G, and B samples will help with any subtlety that can still be derived from the change in greyness from sample to sample despite digitisation. I’ll save this value as a 16-bit signed integer.
Append the value to the end of an array which later is saved to the WAV file.
Normalise the array of audio samples; that is, ‘amplify’ the sample values and re-centre them around the ‘0’ centre line.
Save the array of samples in a WAV file requested to playback at 30 fps x (700-16) frame-lines per second = 20.520 KHz sample rate with 16bit sample size.
Now be warned about a patent: USA patent 5,808,869 “Method and apparatus for nesting secondary signals within a television signal” (and its international equivalents of the same name) owned by Shoot The Moon, who I have just discovered invented the TV Teddy technology in the first place. You may not be able to use this code – and certainly not in any commercial context – without their permission. Shoot The Moon have every right to earn from this patented idea until it expires. If you think you would enjoy creating new content compatible with TV Teddy, or decoding TV Teddy videos to use with other equipment, great! But contact Shoot The Moon via their website and agree some sort of licensing first. The code below is provided purely for your academic interest and self-education in Python programming.
import cv2 import cv2 import array import wave # A flag used to find out if the next video frame was read successfully success = True # Audio samples are counted into this variable: currentSampleCount = 0 # This is the output WAV file's sample rate that will be written into its header information wavSampleRate = 20483 # Each WAV file sample is 2 bytes (16-bit) wavSampleSize = 2 # The Wav file will have a single mono audio channel wavChannels = 1 # An array that will store all samples in a 16-bit signed integer format sampleArray = array.array('h') # The top and bottom video lines in the video frame where will measure the greyscale to get samples # increase audioLineModulationStartLine and decrease audioLineModulationEndLine until the loud 60Hz buzz disappears # from the finished audio audioLineModulationStartLine = 16 audioLineModulationEndLine = 460 # The horizontal position in the video frame where we will take the sample - ideally set to be in the centre # of the greyscale audio line audioLineCentrePixelToRead = 5 print('TV Teddy Audio Extractor from YouTube 720p Source - CV2 version = ', cv2.__version__) frameCount = 0 # Open the video file vidcap = cv2.VideoCapture('media/TVTeddyExcerpt.mp4') if vidcap.isOpened(): # Get some info on the file width = vidcap.get(cv2.CAP_PROP_FRAME_WIDTH) # float height = vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT) # float fps = vidcap.get(cv2.CAP_PROP_FPS) frameCount = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT)) print('Incoming Video: Width=', width, ', height=', height, ', fps=', fps, ', framecount=', frameCount) # Process the file frame by frame for currentFrame in range(0, frameCount): success, image = vidcap.read() if success: # For the current frame, read the grey line and extract a sample from each pixel for scanLine in range(audioLineModulationStartLine, audioLineModulationEndLine + 1): sampleValue = 0 for rgb in range(0, 3): sampleValue += int(image[scanLine, audioLineCentrePixelToRead, rgb]) sampleArray.append(sampleValue) else: print('Failed to read frame', currentFrame) if currentFrame % 1000 == 0: print('count of frames so far =', currentFrame, ' - ', int(currentFrame * 100 / frameCount), "%") # Close the incoming video file vidcap.release() print('Total count of frames =', frameCount) print('Total count of samples =', currentSampleCount) print('Analysing extracted audio...') # Find the sum of sample sizes and the minimum & maximum sample size sumSampleSize = 0 maxSampleValue = 0 for sampleIndex in range(0, len(sampleArray) - 1): sumSampleSize += sampleArray[sampleIndex] if maxSampleValue < sampleArray[sampleIndex]: maxSampleValue = sampleArray[sampleIndex] # Calculate mean average sample size meanSampleSize = int(sumSampleSize / len(sampleArray)) # Now alter the sample values to become rebalanced around zero based # on the mean sample size, and amplified by multiplying the samples # based on the amplifyValue - a process called 'normalisation' print('Normalising....') maxSampleValue = maxSampleValue - meanSampleSize amplifyValue = 16000 / maxSampleValue # reduce constant value 16000 if Warnings below keep happening for sampleIndex in range(0, len(sampleArray)): # Safety catch shown multiplication make signed integer too big or too small for the array try: sampleArray[sampleIndex] = int((sampleArray[sampleIndex] - meanSampleSize) * amplifyValue) except OverflowError: # sampleArray[sampleIndex] is kept at the same value print("Warning: Normalised sample was too large or too small for signed 16-bit array") continue # Write the output WAV file print('Writing WAV file...') f = wave.open('media/output.wav', 'w') f.setparams((wavChannels, wavSampleSize, wavSampleRate, len(sampleArray), "NONE", "Uncompressed")) f.writeframes(sampleArray.tostring()) # Important to convert to string or only half the audio will be written out f.close() print('Completed!') print('Now use an audio application such as Audacity (free) to read the output WAV file and' \ ' increase the pitch by 100% (i.e. double it)')
Let’s look and listen to the output!
The view from Audacity, the free audio editing application:
The waveform is somewhat shifted and not normalised properly around the zero middle, but we’ve certainly got something! Let me save the file as an MP3 now as it’s wastefully large as a WAV file, and take a listen:https://lansley.com/wp-content/uploads/2018/03/output.mp3
The sample rate seems just about spot on as the audio is keeping time when played back in sync with the video. But, if you watch the demonstration in Databits’s video, the audio pitch is a lot higher than here.
This makes me wonder if that’s something that the tech in TV Teddy is doing? So, using Audacity, I’ll apply a ‘High-Quality Pitch Change’ at +100% (so doubling the pitch) while keeping the same speed. How does it sound now?https://lansley.com/wp-content/uploads/2018/03/output2.mp3
Near enough! So what’s happened is that, in order to stay within the limited audio bandwidth of this soundtrack, the designers have halved the pitch (but not speed) of the audio during recording, and either the TV Teddy box or TV Teddy receiver inside the bear has taken the audio and doubled the pitch on playback to restore the child-like voice without needing extra audio bandwidth. All clever stuff – and remember this is all done with analogue circuits in the 1990s.
The distortions in the audio are going to be from the digitisation of the VHS tape where subtle differences between continuous analogue grey levels are lost in the sampling of each video frame. Also, bear in mind that either the analogue to digital video converter – or Youtube – will take an interlaced video and create a progressive version of each frame. In terms of our audio soundtrack, each ‘odd’ 60th second has been overlaid on the next ‘even’ 60th second part of the audio. Digital compression (including YouTube) will also have played their part in damaging this subtlety, too. The ‘buzzing’ effect on the voice is caused by the jump from frame to frame. I’m guessing that the not-exactly-high-fidelity speaker inside TV Teddy’s body reduces the obviousness of this effect for its intended listeners – or there’s a high-pass filter in the circuit blocking low frequencies before the pitch-doubling takes place.
Well, now I feel highly satisfied with that effort! I can certainly try to improve the audio and get the bitrate better (it seems a bit fast so lowering it would help) but we’ve decoded TV Teddy successfully.
Footnote: Now, of course, you’re going to ask: Would it be possible to create a video with a TV Teddy compatible soundtrack?
The answer is Yes! On to Part Three…