Charset problems

Charset problems

Postby daniil » Mon Jul 18, 2011 7:08 pm

Hi and sorry for my English.
I'm using 1.8.5-20110218 version of getid3() and library and I have problems with charset.
For some russian songs getid3() returns something like " ? ??????? (Ural Djs Radio Mix Edit) " while GNOME Nautilus (Bashee and Rythmbox) returns "Я Улетела (Ural Djs Radio Mix Edit)". So I think that's gerID3()'s problem. Maybe I'll update to 2nd version? Is it compatible with 1.8? Does it fix charset problems?
P.S. My PHP is Version 5.2.17 and yes, it supports iconv().
Sincerely,
Daniil.
daniil
User
 
Posts: 5
Joined: Mon Jul 18, 2011 6:52 pm
Location: Russia

Re: Charset problems

Postby James Heinrich » Mon Jul 18, 2011 7:11 pm

Can you send me a sample file, please?
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: Charset problems

Postby daniil » Mon Jul 18, 2011 7:13 pm

Almost forgot. Here's example of such file: http://rghost.ru/15029031/private/15328 ... 3f7b7b4039
daniil
User
 
Posts: 5
Joined: Mon Jul 18, 2011 6:52 pm
Location: Russia

Re: Charset problems

Postby James Heinrich » Mon Jul 18, 2011 7:16 pm

Thanks. After a very quick look, it seems that the title is correctly read into [comments_html], but not in [comments]. This is a getID3 bug, I will figure out why and post back with the solution.
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: Charset problems

Postby James Heinrich » Mon Jul 18, 2011 9:23 pm

The data returned by getID3() in [tags] (and [comments]) array is by default encoded in UTF-8. Are you sure you're attempting to display this data with the correct encoding?

I'm not familiar with the other programs you mention, but it could be they display correctly because they're showing the ID3v1 tag and your system is set to Cyrillic codepage so the extended characters appear to be correctly mapped to the characters you expect to see (although it's debatable whether this is "correct" or not).

I also found a display issue in demo.browse.php which would not show the data correctly, even though getID3() did return the correct data. See attached screenshot.

What is your code for reading/displaying the tag data with getID3?
Attachments
1201-tags.png
1201-tags.png (19.93 KiB) Viewed 1464 times
demo.browse.php
Fixes display issue for multi-byte characters
(27.26 KiB) Downloaded 116 times
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: Charset problems

Postby daniil » Tue Jul 19, 2011 6:29 am

When I used this code:
Code: Select all
require_once('getid3/getid3.php');
$getID3 = new getID3;

$meta = $getID3->analyze(ROOT_DIR.$alterfolder."/".$fname);
getid3_lib::CopyTagsToComments($meta);

$artist = (!empty($meta['tags']['id3v2']["artist"][0])) ? $meta['tags']['id3v2']["artist"][0] : "Неизвестно";
$title = (!empty($meta['tags']['id3v2']["title"][0])) ? $meta['tags']['id3v2']["title"][0] : "Неизвестно";


getid3() returned something like "????? ???????? — ??????????? ?????????". Now I'm trying to use
Code: Select all
// (...)
$artist = (!empty($meta['comments_html']["artist"][0])) ? $meta['comments_html']["artist"][0] : "Неизвестно";
$title = (!empty($meta['comments_html']["title"][0])) ? $meta['comments_html']["title"][0] : "Неизвестно";

But it returns data encoded in "CP1252". Example: "Âàëåðèé Ëåîíòüåâ — Íå Íàäî ßäà". It is better than "?????? ????", but still not UTF-8 :\
Also, my bd charset (each composition is saved in DB) is utf8.
daniil
User
 
Posts: 5
Joined: Mon Jul 18, 2011 6:52 pm
Location: Russia

Re: Charset problems

Postby James Heinrich » Tue Jul 19, 2011 7:45 am

Are you using getID3 v1.8.5 or v1.9.0? If <1.9.0, please upgrade so I'm not trying to fix old bugs.

Note that [comments] is a combination of all tag formats, and so you'll usually see something like this:
['comments']['album'][0] = "Çàæãè ïîä ëåòíèå õèòû (2011)"
['comments']['album'][1] = "Зажги под летние хиты (2011)"
because the first version comes from ID3v1 (assumed to be ISO-8859-1, and converted to UTF-8), and the second (correct) version comes from ID3v2 converted from whatever character set the tag data is written in (ISO-8859-1, UTF-8, UTF-16, etc) to UTF-8.

If you know that your ID3v1 tags are encoded in something other than ISO-8859-1, you can override it like this, and you should get the correct output in [tags] and [comments]
Code: Select all
$getID3 = new getID3;
$getID3->encoding_id3v1 = 'CP1252';
$meta = $getID3->analyze(ROOT_DIR.$alterfolder."/".$fname);
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada

Re: Charset problems

Postby daniil » Thu Jul 21, 2011 4:17 am

I updated my getid3() to the last version and I still get "???????". What am I doing wrong?
Code: Select all
         $artist = (!empty($meta['comments']['artist'][1])) ? $meta['comments']["artist"][1] : false;
         $title = (!empty($meta['comments']["title"][1])) ? $meta['comments']["title"][1] : false;
         if (empty($artist))
            $artist = (!empty($meta['comments']['artist'][0])) ? iconv("CP1252", "UTF-8", $meta['comments']['artist'][0]) : "Неизвестно";
         if (empty($title))
            $title = (!empty($meta['comments']["title"][0])) ? iconv("CP1252", "UTF-8", $meta['comments']["title"][0]) : "Неизвестно";
         echo $title; //returns 58.  ???????
daniil
User
 
Posts: 5
Joined: Mon Jul 18, 2011 6:52 pm
Location: Russia

Re: Charset problems

Postby daniil » Thu Jul 21, 2011 4:36 am

Thank you! Problem solved!
Here's solution (line 2):
Code: Select all
$getID3 = new getID3;
$getID3->setOption(array('encoding' => "UTF-8"));
daniil
User
 
Posts: 5
Joined: Mon Jul 18, 2011 6:52 pm
Location: Russia

Re: Charset problems

Postby James Heinrich » Thu Jul 21, 2011 6:44 am

daniil wrote:$getID3->setOption(array('encoding' => "UTF-8"));
Yes, you're absolutely right, sorry about that -- I thought that was the default encoding for the getID3 class, but it wasn't. But it is now -- I've changed the default value for v1.9.1
James Heinrich
getID3() v1 developer
 
Posts: 1203
Joined: Fri May 04, 2001 11:00 am
Location: London, ON, Canada


Return to Bug Reports (v1.x) - resolved

Who is online

Users browsing this forum: No registered users and 0 guests

cron