HTML paste and HTML export

General TRichView support forum. Please post your questions here
Post Reply
K.A.
Posts: 5
Joined: Wed Feb 22, 2006 8:04 am

HTML paste and HTML export

Post by K.A. »

I have paste russan text from Internet Explorer.
Everything is OK.
But then I use SaveHTMLToStream to producer HTML text.
I have a problem. Text like this программируемы&#1077

Another way. I have entered text is editor by keyboard, not from clipboard.
Everytihink is ok. In HTML my russian symbols is readable.
I think this beacouse of unicode 2 -byte coding. But where?
How I may to correct this?
K.A.
Posts: 5
Joined: Wed Feb 22, 2006 8:04 am

Post by K.A. »

Sorry my incorrect text is auto convert in corect
but only one symbol present
&#1077
this symbols is &# and 4 digits.
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

This means you TRichView uses Unicode.
By default, TRichView saves non-English unicode characters using their codes, like &#1077. It is correct, but files may become too large.

More elegant way to save Unicode document is using UTF-8 encoding: include rvsoUTF8 in the Options parameter of SaveHTML/SaveHTMLEx.
K.A.
Posts: 5
Joined: Wed Feb 22, 2006 8:04 am

Post by K.A. »

Thanks, but I assumed that I corect in all places in dbinspector to NO UNICODE. May be there is another place to not use unicode? I not need to use it.
I simply want to my symbols be readable. And this correct if I type from keyboard, not paste from clipboard (word2003 or Internet explorer).
I assume, that problem in HTMLImport. Can I fix to not use unicode in import?

I try to use rvsoUTF8 in result I see Ñ￾овмеÑ￾Ñ‚Ð¸Ð¼Ñ but this is not readable also.

OK.... I write workaround to decode and replace &#1077 to russan symbols.
But this is not full correct way becouse HTML text is large and complex.
I not sure in correct parser to find all, and only wanted tags.
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Characters like &#1077 in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.

Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).
I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.

UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.
K.A.
Posts: 5
Joined: Wed Feb 22, 2006 8:04 am

Post by K.A. »

Sergey Tkachenko wrote:Characters like &#1077 in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.
This is in export operation. But what about import? How I may closeoff Unicode in Import?
Sergey Tkachenko wrote:Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).
I use TextStyle[0].Unicode=false.
Sergey Tkachenko wrote: I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.
I need not in any other charset. Only russian and English.
Sergey Tkachenko wrote: UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.
I know. But exported HTML I use to post to system, that not correct proceed unicode. :(
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Unicode characters in RichView HTML output mean that this text has Unicode encoding in TRichView.
You can convert document from Unicode using ConvertFromUnicode procedure:

Code: Select all

procedure ConvertRVFromUnicode(RVData: TCustomRVData);
var i,r,c, StyleNo: Integer;
    table: TRVTableItemInfo;
begin
  for i := 0 to RVData.ItemCount-1 do begin
    StyleNo := RVData.GetItemStyle(i);
    if StyleNo>=0 then begin
      if RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin
        RVData.Items[i] := RVData.GetItemTextA(i);
        Exclude(RVData.GetItem(i).ItemOptions, rvioUnicode);
      end;
      end
    else if RVData.GetItemStyle(i)=rvsTable then begin
      table := TRVTableItemInfo(RVData.GetItem(i));
      for r := 0 to table.Rows.Count-1 do
        for c := 0 to table.Rows[r].Count-1 do
          if table.Cells[r,c]<>nil then
            ConvertRVFromUnicode(table.Cells[r,c].GetRVData);
    end;
  end;
end;

procedure ConvertFromUnicode(rv: TCustomRichView);
var i: Integer;
begin
  ConvertRVFromUnicode(rv.RVData);
  for i := 0 to rv.Style.TextStyles.Count-1 do
    rv.Style.TextStyles[i].Unicode := False;
end;
To make sure that this conversion will be to Russian, you can call:

Code: Select all

  for i := 0 to rv.Style.TextStyles.Count-1 do
    rv.Style.TextStyles[i].Charset := RUSSIAN_CHARSET;
before calling ConvertFromUnicode.
Post Reply