QTJ: Failed E-Podcast Experiment as Slideshow Example

by Chris Adamson


I was experimenting with doing an enhanced podcast. It's not really panning out and I have way better stuff to do, but I saw two important things along the way that I wanted to post, if only to get them into Google.



Enhanced What?



An "Enhanced Podcast" is Apple's term for a podcast that offers some images and chapter marks. Back up: a podcast is basically audio files and an RSS feed, so you can subscribe with an RSS client and get informed when there's a new audio file to pull down. In an enhanced podcast, this file adds images and chapters to the audio.



When played in iTunes, the images appear in the cover art viewer panel. On an iPod Photo, they appear on screen.



MAKE magazine's blog has a great enhanced podcast how-to. If you look on iTunes, you can also see at least one recent show they did as an enhanced podcast (the one on making a charging cable for the Sony PSP). It combines the information provided with Apple's "ChapterTool" (a beta tool for making enhanced podcasts) with their own experiences.



Key gotcha: the files have to be MPEG-4 containers, so the audio needs to be AAC. Among other downsides, QuickTime for Windows doesn't export AAC audio (even if you buy QT Pro? I'm not sure), so only Macs can create these for now. FWIW, I don't expect that to last -- people won't buy Macs just to create enhanced podcasts, but getting Windows-based authors to create enhanced podcasts would probably lead to more iPod-only enhanced podcasts, which would sell more iPods. And when Apple probably gets about the same margin off both iPods and consumer Macs....



What Does This Have To Do With QuickTime?


I figured I had you at "chapter track"... who else does chapter tracks but QuickTime? And a video track where the samples can have arbitrary durations, and is thus exceptionally well suited to slideshows? That's very QuickTime-y.



In fact, if you look inside the m4b file with something like QuickTime Atomizer, or the QuickTime File Format parser I did for an ONJava article a few years ago, or even HexEdit (c'mon, be hardcore, you know you want to), you'll see the insides look a lot more like QuickTime than like MPEG-4. There's no MPEG-4 Initial Object Descriptor atom (iods), nor tracks for the object and scene descriptors (media of types odsm and sdsm).



What there is is an audio track, a video track, and two text tracks. Yet the video track allows for the samples to be regular JPEG's or PNGs (look for their magic strings in the mdat atom), and I don't think MPEG-4 even has a text track. The two text tracks are the chapter track (which is just a text track where the samples are the chapter names, with a track reference from the video track to the text track, see IceFlow #3 ), and an HREF track for links to the MAKE site (an HREF track is just a slightly special text track - see chapter 9 in QuickTime for Java: A Developer's Notebook)



On the other hand, the strings are all null-terminated, which is very MPEG-4-ish. Usually, QuickTime strings use the Pascal-like convention of one byte of length followed by a run of characters.



So Why Not Just Export This As MPEG-4?


Because that turns the image samples into a real MPEG-4 video track, which I would be very surprised if the iPod Photo can play. Also, QuickTime doesn't export the text tracks. And it doesn't export audio on Windows.



So What's This Code?


What I was trying was to build up the QuickTime equivalent to this enhanced podcast, and then see if I could hack it into an MPEG-4 by just switching the file extension or copying over the ftyp atom. That's not completely implausible, since MPEG-4's file format is extremely similar to QuickTime's.



But that didn't work, and I don't want to mess around with it much longer.



Still, what I have is potentially useful as a slide show example. What it does is to copy over an MPEG-4 audio track from a selected file (you could take out the "check for MPEG-4" stuff and it would work with any QT-friendly audio), copies it into a new movie, then takes three images from known locations (images/chap1.png, images/chap2.png, and images/chap3.png) and makes a video track from them, putting them at times 0, duration*0.33, and duration*0.67 (ie, each covers a third of the movie, having a duration of audioDuration/3). Then it flatten()s, so the sound and all the images are in the same file and you can mail it to all your QuickTime-loving friends.




import quicktime.*;
import quicktime.std.*;
import quicktime.std.image.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.io.*;
import quicktime.util.*;

import java.io.File;

public class EPodcastTest extends Object {
// 1-based count of "images/chapn.png" files,
// where 1 <= n <= CHAPTER_COUNT
public static final int CHAPTER_COUNT = 3;

public static void main (String arrrImAPirate[] ) {
try {
QTSession.open();
// open a movie
QTFile file =
QTFile.standardGetFilePreview (QTFile.kStandardQTFileTypes);
OpenMovieFile omFile = OpenMovieFile.asRead (file);
Movie audioMovie = Movie.fromFile (omFile);

// find audio track
System.out.println (audioMovie.getTrackCount() + " tracks");
Track oldAudioTrack =
audioMovie.getIndTrackType (1,
StdQTConstants.audioMediaCharacteristic,
StdQTConstants.movieTrackCharacteristic);
if (oldAudioTrack == null) {
System.out.println ("Didn't find audio track - bye");
System.exit (-1);
}
System.out.println ("found audio track");
if (oldAudioTrack.getMedia().getSampleDescription(1).getDataFormat() !=
QTUtils.toOSType ("mp4a")) {
System.out.println ("Audio track is not mpeg-4 audio - bye");
System.exit (-1);
}
System.out.println ("Found MPEG-4 audio track");

// create new movie
QTFile podMovieFile = new QTFile (new File ("podmovie.mov"));
Movie podMovie =
Movie.createMovieFile(podMovieFile,
StdQTConstants.kMoviePlayer,
StdQTConstants.createMovieFileDeleteCurFile |
StdQTConstants.createMovieFileDontCreateResFile);

// copy audio track
Track podAudioTrack =
podMovie.newTrack (0.0f, // width
0.0f, // height
oldAudioTrack.getVolume());
// note how the data ref writes back to the podMovie
SoundMedia newMedia =
new SoundMedia (podAudioTrack,
oldAudioTrack.getMedia().getTimeScale(),
DataRef.fromMovie (podMovie));

podAudioTrack.getMedia().beginEdits();
oldAudioTrack.insertSegment (podAudioTrack,
0,
oldAudioTrack.getDuration(),
0);
podAudioTrack.getMedia().endEdits();

// add images as video track
Track podVideoTrack =
podMovie.newTrack (300.0f, 300.0f, 0f);
VideoMedia podVideoMedia =
new VideoMedia (podVideoTrack, podMovie.getTimeScale());
podVideoMedia.beginEdits();
File imagesDir = new File("images");
for (int i=1; i<=CHAPTER_COUNT; i++) {
QTFile imageFile = new QTFile (
new File (imagesDir, "chap" + i + ".png"));
System.out.println ("open " + imageFile.getPath());
GraphicsImporter gi = new GraphicsImporter (imageFile);
ImageDescription id = gi.getImageDescription();
System.out.println ("ImageDescrption: " + id);

// add a sample to the video track
int sampleNumber = i - 1;
int dataSize = gi.getDataSize();
System.out.println ("data size is " + dataSize);
RawEncodedImage rei =
new RawEncodedImage (dataSize, false);
gi.readData (rei, 0, dataSize);
System.out.println ("read image");
// argh, addSample wants QTHandleRef, but
// RawEncodedImage is a QTPointer. copying the
// bytes seems an unfortunate answer
QTHandle imageHdl = new QTHandle (rei.getBytes());
int sampleFlags = 0; // don't set mediaSampleNotSync
podVideoMedia.addSample (imageHdl, // data handle
0, // offset
dataSize, // size
podMovie.getDuration() / 3, // duration
id, // sample description
1, // num samples
sampleFlags ); // sampleFlags
}
podVideoMedia.endEdits();
podVideoTrack.insertMedia (0, // trackStart
0, // mediaTime
podVideoMedia.getDuration(), // mediaDuration
1); // mediaRate
System.out.println ("inserted media into video track");

System.out.println ("flattening");
podMovie.flatten(StdQTConstants.flattenAddMovieToDataFork |
StdQTConstants.flattenForceMovieResourceBeforeMovieData,
new QTFile (new File ("flatmovie.mov")),
StdQTConstants.kMoviePlayer,
IOConstants.smSystemScript,
StdQTConstants.createMovieFileDeleteCurFile,
StdQTConstants.movieInDataForkResID,
null); // resName

System.out.println ("Done");

} catch (QTException qte) {
qte.printStackTrace();
} finally {
QTSession.close();
System.exit(0);
}
} // main

}


And The Useful Parts?


Two things would be interesting if you were trying to apply techniques from the book and got stuck -- those examples generally deal with doing one thing at a time with a movie, and this is weird because it's copying media with Track.insertSegment(), and then adding samples with Media.addSample().



Useful Part One: Using The Movie File As The DataRef For New Media


In the book, the "add a track" stuff deals with movies that already exist or are created with new Movie(). For this slideshow maker, I created a movie with the createMovieFile() method:

Movie podMovie =
Movie.createMovieFile(podMovieFile,
StdQTConstants.kMoviePlayer,
StdQTConstants.createMovieFileDeleteCurFile |
StdQTConstants.createMovieFileDontCreateResFile);


The gotcha is that I created the SoundMedia in a way that fails when you flatten it. This is based on the approach of having a bogus in-memory DataRef for storing the media (see Q&A: BeginMediaEdits -2050 badDataRefIndex error after calling NewMovie ):




SoundMedia newMedia =
new SoundMedia (podAudioTrack,
oldAudioTrack.getMedia().getTimeScale(),
new DataRef (new QTHandle()));


Like I said, this sucks because you get eofErr when you flatten. It works better to tell it "no, really, store the media in the movie file I just created":




SoundMedia newMedia =
new SoundMedia (podAudioTrack,
oldAudioTrack.getMedia().getTimeScale(),
DataRef.fromMovie (podMovie));


In fact, it's simpler to omit the DataRef altogether, unless you know you need it (eg, you used new Movie() instead of createMovieFile()). So this works too:




SoundMedia newMedia =
new SoundMedia (podAudioTrack,
oldAudioTrack.getMedia().getTimeScale());


Useful Part Two: Storing Slide Show Pictures In The Video Track



In the book, I show the more advanced way of laying down a video track with raw samples, using a CSequence so you can pick up temporal compression. This puts original frames in a GWorld and compresses each one. For a slide show, you wouldn't want to re-compress your images if they're already in a format like JPEG or PNG, you just want to add them straight into the VideoMedia, which is possible.



So, book's approach is:



  1. Import image with a GraphicsImporter

  2. Set up a CSequence. Get an ImageDescription from this

  3. Draw some part of the imported image into a GWorld

  4. Compress the GWorld into a frame.

  5. addSample() with handle returned from the compress call and ImageDescription from the CSequence



To do slides, it's simpler:



  1. Import image with GraphicsImporter

  2. Get an ImageDescription from the importer

  3. Make a QTHandle by copying bytes from the importer

  4. addSample() with this handle and description


BTW, step 3 sucks. There has to be a better way, but this is experimental hackery, so I'm just happy it works.



Speaking of hackery, it creates a useless reference movie file called "podmovie.mov" that it should probably delete. "flatmovie.mov" is the flattened movie with the slideshow.



Conclusion



Sheesh. This is long. I should have put it on O'Reilly as a blog or something... maybe I'll do that too... Daddy needs to sell some ads... Anyways, I hope it helps someone at some point.



I'm officially done with trying to hack enhanced podcasts for now. Maybe Apple will give us a nice MovieExporter for doing them at some point, and then we could build a nice Java GUI for capturing and editing the sound, arranging the pictures, etc.