[1.8.5] Issue with encoding of MP3 tags

[1.8.5] Issue with encoding of MP3 tags

Postby Bevinsky » Fri Jun 17, 2011 7:21 pm

Hello, I'm using version 1.8.5 of getid3, and version 5.2.17 of PHP, together with Apache on Windows.

I have a problem with viewing the proper tags for MP3 files with UTF-8 in them. The MP3 files' tags have Japanese characters in them, so they are assumingly encoded in UTF-8. However, even if I set $getid3->encoding to UTF-8, all the tags show up as garbage.

I've uploaded an example file here: http://www.mediafire.com/?y6wy8wybos52836. In this file, the artist tag contains Japanese characters, but they show up as garbage in getid3, no matter what I do.
Bevinsky
User
 
Posts: 3
Joined: Fri Jun 17, 2011 7:03 pm

Re: [1.8.5] Issue with encoding of MP3 tags

Postby James Heinrich » Mon Jun 20, 2011 12:49 pm

The ID3v2 portion is all UTF-16 encoded, but is still almost entirely ASCII range characters, with the exception of the artist field:
閣下
APE (UTF-8 encoding) and ID3v1 (ISO-88519-1 encoding) tags are also present, but neither attempt to represent Japanese characters (except the ID3v1 artist has been transliterated to "??")
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: [1.8.5] Issue with encoding of MP3 tags

Postby Bevinsky » Mon Jun 20, 2011 5:00 pm

If I understand the encoding property correct, getid3 will convert the tags in the file to that encoding when you access them in the file information array. If that is the case, why am I not getting the correct Japanese characters in UTF-8 when I set encoding to that?
Bevinsky
User
 
Posts: 3
Joined: Fri Jun 17, 2011 7:03 pm

Re: [1.8.5] Issue with encoding of MP3 tags

Postby James Heinrich » Mon Jun 20, 2011 5:48 pm

It depends where you're looking.

In the various tag format keys (e.g. $info['id3v2'] or $info['ape']) the data is returned as it exists in the file. In some tag formats (like ID3v2) each data element (e.g. artist, album, title) can be represented with different encodings, so the actual data format may vary. When the data is compiled into $info['tags'] and $info['comments'] the encoding is translated to whatever you have selected (typically UTF-8). What data are you getting from where in the array, what values do you expect and what values are you getting?
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: [1.8.5] Issue with encoding of MP3 tags

Postby Bevinsky » Mon Jun 20, 2011 7:17 pm

Well, while testing I'm just using the basic demo, demo.basic.php, I think it's called. I set the filename and the encoding (to UTF-8), have it call the "copy-to-comments" function, and then let it print_r the whole thing. I don't see the two Japanese characters anywhere, neither in tags nor comments. I've also made sure that the browser's encoding is set to UTF-8 as well.

One thing I did notice, though, is that one of the "tags_html" fields did have some html entities in them, which when I copied into an html document, actually produce the correct characters. I don't want to store html entities in my database, though.
Bevinsky
User
 
Posts: 3
Joined: Fri Jun 17, 2011 7:03 pm

Re: [1.8.5] Issue with encoding of MP3 tags

Postby James Heinrich » Tue Jul 26, 2011 11:08 am

It sounds like your issue is probably similar to this one: the default encoding (before v1.9.1) is ISO-8859-1. You can easily override that when instantiating getID3:
Code: Select all
$getID3 = new getID3;
$getID3->setOption(array('encoding' => "UTF-8"));
You can of course use another Unicode encoding if you prefer, but UTF-8 is convenient. As of v1.9.1, getID3() will use UTF-8 as the default encoding.
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada


Return to Support 1.x (resolved)

Who is online

Users browsing this forum: No registered users and 0 guests

cron