HTML paste and HTML export

K.A. · Post by **K.A.** » Wed Feb 22, 2006 8:13 am

I have paste russan text from Internet Explorer.
Everything is OK.
But then I use SaveHTMLToStream to producer HTML text.
I have a problem. Text like this программируемы&#1077

Another way. I have entered text is editor by keyboard, not from clipboard.
Everytihink is ok. In HTML my russian symbols is readable.
I think this beacouse of unicode 2 -byte coding. But where?
How I may to correct this?

K.A. · Post by **K.A.** » Wed Feb 22, 2006 9:18 am

Sorry my incorrect text is auto convert in corect
but only one symbol present
&#1077
this symbols is &# and 4 digits.

Post by **Sergey Tkachenko** » Wed Feb 22, 2006 7:45 pm

This means you TRichView uses Unicode.
By default, TRichView saves non-English unicode characters using their codes, like &#1077. It is correct, but files may become too large.

More elegant way to save Unicode document is using UTF-8 encoding: include rvsoUTF8 in the Options parameter of SaveHTML/SaveHTMLEx.

K.A. · Post by **K.A.** » Thu Feb 23, 2006 4:05 am

Thanks, but I assumed that I corect in all places in dbinspector to NO UNICODE. May be there is another place to not use unicode? I not need to use it.
I simply want to my symbols be readable. And this correct if I type from keyboard, not paste from clipboard (word2003 or Internet explorer).
I assume, that problem in HTMLImport. Can I fix to not use unicode in import?

I try to use rvsoUTF8 in result I see ÑÐ¾Ð²Ð¼ÐµÑÑ‚Ð¸Ð¼Ñ but this is not readable also.

OK.... I write workaround to decode and replace &#1077 to russan symbols.
But this is not full correct way becouse HTML text is large and complex.
I not sure in correct parser to find all, and only wanted tags.

Post by **Sergey Tkachenko** » Sun Feb 26, 2006 10:23 am

Characters like &#1077 in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.

Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).
I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.

UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.

K.A. · Post by **K.A.** » Mon Feb 27, 2006 10:22 am

Sergey Tkachenko wrote:Characters like &#1077 in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.

This is in export operation. But what about import? How I may closeoff Unicode in Import?

Sergey Tkachenko wrote:Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).

I use TextStyle[0].Unicode=false.

Sergey Tkachenko wrote: I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.

I need not in any other charset. Only russian and English.

Sergey Tkachenko wrote: UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.

I know. But exported HTML I use to post to system, that not correct proceed unicode.

Post by **Sergey Tkachenko** » Thu Mar 02, 2006 5:30 pm

Unicode characters in RichView HTML output mean that this text has Unicode encoding in TRichView.
You can convert document from Unicode using ConvertFromUnicode procedure:

Code: Select all

procedure ConvertRVFromUnicode(RVData: TCustomRVData);
var i,r,c, StyleNo: Integer;
    table: TRVTableItemInfo;
begin
  for i := 0 to RVData.ItemCount-1 do begin
    StyleNo := RVData.GetItemStyle(i);
    if StyleNo>=0 then begin
      if RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin
        RVData.Items[i] := RVData.GetItemTextA(i);
        Exclude(RVData.GetItem(i).ItemOptions, rvioUnicode);
      end;
      end
    else if RVData.GetItemStyle(i)=rvsTable then begin
      table := TRVTableItemInfo(RVData.GetItem(i));
      for r := 0 to table.Rows.Count-1 do
        for c := 0 to table.Rows[r].Count-1 do
          if table.Cells[r,c]<>nil then
            ConvertRVFromUnicode(table.Cells[r,c].GetRVData);
    end;
  end;
end;

procedure ConvertFromUnicode(rv: TCustomRichView);
var i: Integer;
begin
  ConvertRVFromUnicode(rv.RVData);
  for i := 0 to rv.Style.TextStyles.Count-1 do
    rv.Style.TextStyles[i].Unicode := False;
end;

To make sure that this conversion will be to Russian, you can call:

Code: Select all

  for i := 0 to rv.Style.TextStyles.Count-1 do
    rv.Style.TextStyles[i].Charset := RUSSIAN_CHARSET;

before calling ConvertFromUnicode.