getID3, ffmpeg and HTML entities

getID3, ffmpeg and HTML entities

Postby nickharambee » Sat Jun 02, 2012 4:10 am

Hi,

I have a script that uses getID3 to pull ID3 tags from various audio file formats before converting them to MP3 using ffmpeg. Some of the characters in the tags, notably ampersands (&) and double quotes (") are getting converted to HTML entities, i.e. & and ". I am using the latest version of getID3 . I am wondering if this issue is caused by getID3 or ffmpeg, and would be glad of any help.

Thanks,

Nick
nickharambee
User
 
Posts: 22
Joined: Sat Mar 05, 2011 4:42 am

Re: getID3, ffmpeg and HTML entities

Postby James Heinrich » Sat Jun 02, 2012 6:19 am

It depends where you're getting the data from.
[comments] and [tags] contain unmodified data
[comments_html] and [tags_html] contain entitied data.
The rest of the returned data may contain the metadata in an even more raw form, depending on the tag format, but that's not usually of much interest.

Where are you pulling your data from that you're seeing the html-entity-encoded data?

As always, if you view the analysis of your file in your browser with /demo/demo.browse.php you'll get a good overview of the various types of data that are returned where.
James Heinrich
getID3() v1 developer
 
Posts: 1204
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: getID3, ffmpeg and HTML entities

Postby nickharambee » Sat Jun 02, 2012 7:43 am

Thanks James. I was indeed using [comments_html] rather than [comments]. I switched to using [comments] and ampersands are now showing correctly. For double quotes I am not seeing " anymore, but rather the quotes are getting stripped. So the artist bonnie "prince" billy, becomes bonnie prince billy. I have checked using demo.browse.php, and indeed the artist field in comments has had the quotes stripped on conversion. You can see the results here:

<link redacted for security reasons>

Quotes are of course tricky to handle when coding. I'm wonder if you have any advice on handling quotes when converting using ffmpeg and mapping metadata extracted using getID3.
nickharambee
User
 
Posts: 22
Joined: Sat Mar 05, 2011 4:42 am

Re: getID3, ffmpeg and HTML entities

Postby James Heinrich » Sat Jun 02, 2012 9:50 am

The issue of missing quotes has nothing to do with getID3, but rather whatever program tagged that file.

If you notice, there are two similar entries in the ID3v2 tag:
[TPE1] (aka "artist" -> "Lead performer(s)/Soloist(s)")
[TPE2] (aka "band" -> "Band/orchestra/accompaniment")

In this particular sample file, both values are assigned, and to almost the same thing, but one with and one without quotes:
TPE1 = bonnie prince billy
TPE2 = bonnie "prince" billy

So if you look at [comments][band] you'll see the quoted one, and [comments][artist] has the stripped one.

This is an unusual practice, perhaps, but that's how whatever program tagged it, I'm not sure why. getID3 simply reports what's there.
James Heinrich
getID3() v1 developer
 
Posts: 1204
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: getID3, ffmpeg and HTML entities

Postby nickharambee » Sat Jun 02, 2012 9:58 am

Thanks James. The original file had quotes around prince in both TP1 and TP2, so it must be that the ffmpeg conversion process is stripping them. Thanks again for your help.
nickharambee
User
 
Posts: 22
Joined: Sat Mar 05, 2011 4:42 am


Return to Support 1.x (resolved)

Who is online

Users browsing this forum: No registered users and 0 guests

cron