Charset problems

Locked
daniil
User
Posts:5
Joined:Mon Jul 18, 2011 11:52 pm
Are you a spambot?:no
Location:Russia
Charset problems

Post by daniil » Tue Jul 19, 2011 12:08 am

Hi and sorry for my English.
I'm using 1.8.5-20110218 version of getid3() and library and I have problems with charset.
For some russian songs getid3() returns something like " ? ??????? (Ural Djs Radio Mix Edit) " while GNOME Nautilus (Bashee and Rythmbox) returns "Я Улетела (Ural Djs Radio Mix Edit)". So I think that's gerID3()'s problem. Maybe I'll update to 2nd version? Is it compatible with 1.8? Does it fix charset problems?
P.S. My PHP is Version 5.2.17 and yes, it supports iconv().
Sincerely,
Daniil.

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: Charset problems

Post by James Heinrich » Tue Jul 19, 2011 12:11 am

Can you send me a sample file, please?

daniil
User
Posts:5
Joined:Mon Jul 18, 2011 11:52 pm
Are you a spambot?:no
Location:Russia

Re: Charset problems

Post by daniil » Tue Jul 19, 2011 12:13 am

Almost forgot. Here's example of such file: http://rghost.ru/15029031/private/15328 ... 3f7b7b4039

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: Charset problems

Post by James Heinrich » Tue Jul 19, 2011 12:16 am

Thanks. After a very quick look, it seems that the title is correctly read into [comments_html], but not in [comments]. This is a getID3 bug, I will figure out why and post back with the solution.

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: Charset problems

Post by James Heinrich » Tue Jul 19, 2011 2:23 am

The data returned by getID3() in [tags] (and [comments]) array is by default encoded in UTF-8. Are you sure you're attempting to display this data with the correct encoding?

I'm not familiar with the other programs you mention, but it could be they display correctly because they're showing the ID3v1 tag and your system is set to Cyrillic codepage so the extended characters appear to be correctly mapped to the characters you expect to see (although it's debatable whether this is "correct" or not).

I also found a display issue in demo.browse.php which would not show the data correctly, even though getID3() did return the correct data. See attached screenshot.

What is your code for reading/displaying the tag data with getID3?
Attachments
1201-tags.png
1201-tags.png (19.93KiB)Viewed 16514 times
demo.browse.php
Fixes display issue for multi-byte characters
(27.26KiB)Downloaded 1095 times

daniil
User
Posts:5
Joined:Mon Jul 18, 2011 11:52 pm
Are you a spambot?:no
Location:Russia

Re: Charset problems

Post by daniil » Tue Jul 19, 2011 11:29 am

When I used this code:

Code: Select all

require_once('getid3/getid3.php');
$getID3 = new getID3;

$meta = $getID3->analyze(ROOT_DIR.$alterfolder."/".$fname);
getid3_lib::CopyTagsToComments($meta);

$artist = (!empty($meta['tags']['id3v2']["artist"][0])) ? $meta['tags']['id3v2']["artist"][0] : "Неизвестно";
$title = (!empty($meta['tags']['id3v2']["title"][0])) ? $meta['tags']['id3v2']["title"][0] : "Неизвестно";

getid3() returned something like "????? ???????? — ??????????? ?????????". Now I'm trying to use

Code: Select all

// (...)
$artist = (!empty($meta['comments_html']["artist"][0])) ? $meta['comments_html']["artist"][0] : "Неизвестно";
$title = (!empty($meta['comments_html']["title"][0])) ? $meta['comments_html']["title"][0] : "Неизвестно";
But it returns data encoded in "CP1252". Example: "Âàëåðèé Ëåîíòüåâ — Íå Íàäî ßäà". It is better than "?????? ????", but still not UTF-8 :\
Also, my bd charset (each composition is saved in DB) is utf8.

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: Charset problems

Post by James Heinrich » Tue Jul 19, 2011 12:45 pm

Are you using getID3 v1.8.5 or v1.9.0? If <1.9.0, please upgrade so I'm not trying to fix old bugs.

Note that [comments] is a combination of all tag formats, and so you'll usually see something like this:
['comments']['album'][0] = "Çàæãè ïîä ëåòíèå õèòû (2011)"
['comments']['album'][1] = "Зажги под летние хиты (2011)"
because the first version comes from ID3v1 (assumed to be ISO-8859-1, and converted to UTF-8), and the second (correct) version comes from ID3v2 converted from whatever character set the tag data is written in (ISO-8859-1, UTF-8, UTF-16, etc) to UTF-8.

If you know that your ID3v1 tags are encoded in something other than ISO-8859-1, you can override it like this, and you should get the correct output in [tags] and [comments]

Code: Select all

$getID3 = new getID3;
$getID3->encoding_id3v1 = 'CP1252';
$meta = $getID3->analyze(ROOT_DIR.$alterfolder."/".$fname);

daniil
User
Posts:5
Joined:Mon Jul 18, 2011 11:52 pm
Are you a spambot?:no
Location:Russia

Re: Charset problems

Post by daniil » Thu Jul 21, 2011 9:17 am

I updated my getid3() to the last version and I still get "???????". What am I doing wrong?

Code: Select all

			$artist = (!empty($meta['comments']['artist'][1])) ? $meta['comments']["artist"][1] : false;
			$title = (!empty($meta['comments']["title"][1])) ? $meta['comments']["title"][1] : false;
			if (empty($artist))
				$artist = (!empty($meta['comments']['artist'][0])) ? iconv("CP1252", "UTF-8", $meta['comments']['artist'][0]) : "Неизвестно";
			if (empty($title))
				$title = (!empty($meta['comments']["title"][0])) ? iconv("CP1252", "UTF-8", $meta['comments']["title"][0]) : "Неизвестно";
			echo $title; //returns 58.  ???????

daniil
User
Posts:5
Joined:Mon Jul 18, 2011 11:52 pm
Are you a spambot?:no
Location:Russia

Re: Charset problems

Post by daniil » Thu Jul 21, 2011 9:36 am

Thank you! Problem solved!
Here's solution (line 2):

Code: Select all

$getID3 = new getID3;
$getID3->setOption(array('encoding' => "UTF-8"));

James Heinrich
getID3() v1 developer
Posts:1477
Joined:Fri May 04, 2001 4:00 pm
Are you a spambot?:no
Location:Northern Ontario, Canada
Contact:

Re: Charset problems

Post by James Heinrich » Thu Jul 21, 2011 11:44 am

daniil wrote:$getID3->setOption(array('encoding' => "UTF-8"));
Yes, you're absolutely right, sorry about that -- I thought that was the default encoding for the getID3 class, but it wasn't. But it is now -- I've changed the default value for v1.9.1

Locked