fetching mp3s

So far so good. The next exercise is to actually make this script useful by fetching the MP3s and arranging them into easy to understand directories.

The first order of business is to determine the name for the folder. I chose to use the channel name, and if that fails, the title of this episode. The later isn't a very good choice as we could easily end up with a directory for each item in the feed.

function getmp3() { if [[ "${CHANNEL}" = "" ]] ; then CHANNEL="${TITLE}" ; fi if [[ "${CHANNEL}" = "" ]] ; then echo "Unable to determine the channel name, skipping." return fi # Remove html entities, convert spaces to _s and strip non alphanumeric characters dirname=$(echo ${CHANNEL}|sed -e 's/\&.*\;//g' -e 's/ /_/g' -e 's/[^a-zA-Z0-9\._]//g') dirname="${BASEDIR}/${dirname}"

I'm rather ruthless with the special characters. There isn't any good technical reason to remove them, other than it make handling the files from the command line easier. My personal preference is to leave spaces and most punctuation, removing single and double quotes, ampersands and semicolons. To keep the regexp here a little cleaner we'll just keep underscores, periods, and all alphanumerics.

I've used three different patterns for sed above. The first is to remove html entities like &, the second converts spaces to _s, and the last removes everything that's not a-z, A-Z, 0-9, period, nd underscore.

# Try to make the directory, and at the same time test to see if it exists. if ! mkdir -p "${dirname}" ; then echo "Unable to make the directory: ${dirname}" return fi

If you'd rather put all the MP3s in the BASEDIR, then remove all lines between here and the function line above, and change the dirname to BASEDIR in the next line.

cd "${dirname}" # Check the URL, and use alternates if necessary. # IF you are using an older bash, you may need to use something like this: # if echo "${LINK}" | grep -i 'http.*mp3' ; then

Again with the =~, if your bash version doesn't support it, you can accomplish the same thing using the grep in the comment above.

if [[ ! "${ENCL}" =~ 'http.*mp3' ]] ; then if [[ "${LINK}" =~ 'http.*mp3' ]] ; then ENCL="${LINK}" elif [[ "${HREF}" =~ 'http.*mp3' ]] ; then ENCL="${HREF}" fi fi if [[ "${ENCL}" = "" ]] ; then echo "Unable to determine mp3 URL, skipping." else # -nc is noclobber, means don't fetch if a file of that name exists.

I had tried using -N or -c to check timestamps, or continue if the transfer failed, but after adding or changing id3tags, the file size changes. Using -N, wget would complain of different file sizes, and fetch the entire file again. With the -c option, if the file was made smaller by the id3tags (unlikely), wget would append the last X bytes, corrupting the end of the mp3.

if ! wget -nc "${ENCL}" ; then echo "Error fetching ${ENCL}" else echo "Channel: "$CHANNEL echo "Title: "$TITLE echo "ENCL: "$ENCL echo "" fi fi TITLE=""; LINK=""; DATE=""; ENCL=""; HREF="" }

podcast.005.sh

That's it for tonight. I have too much other work to do. Hopefully tomorrow I'll get to checking and adding some id3 tags. There are a couple command line utilities for doing so. I will be using id3v2, from: http://id3v2.sourceforge.net/. I picked this utility because it allows writing the TCON tag which appears to be used as the genre field by most mp3 players. id3v2 is also easy to compile with only one dependency (id3lib).

One thought on “fetching mp3s

  1. Pingback: swiss army watch band

Leave a Reply

Your email address will not be published. Required fields are marked *