loading a Unicode file line by line

General TRichView support forum. Please post your questions here
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

loading a Unicode file line by line

Post by Cosmin3 »

Hi.
I have a question.
I'm loading a UTF8 Unicode into a string array like this: I load the entire text into a string then from the third character to the end I split in lines (at #13#0#10#0).
After I make some modifications to the lines I want to load them to RichViewEdit then save all the text as Unicode or Ansi.
I tried with "AddTextNLW" but I see some strange characters in the text and when I save I get a file that can't be loaded in any text editor.
What should I do? Please help me... Thank you.
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

I know this example (from Help), I tried that but it's not working...
This is because it's loading all the text at once - I'm loading line by line...
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

1. Load line in s: String. It will contain text in UTF-8 encoding.
2. Convert text to WideString (UTF-16): ws := UTF8Decode(s), where ws is WideString
3. Use AddNLWTag or AddTextNLW to add ws in TRichView
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

Yes, I do that but it's not working...

I see that in editor http://www.imagehosting.gr/show.php/974 ... e.PNG.html

PS: I checked again the code that extracts the lines from text: works 100% fine.
The text should begin like that: "LETHAL WEAPON 4.#13#10Riggs,...".
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Please post here your code
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

If I do that:

ScaleRichView.RichviewEdit.LoadTextW(FileName, 0, 0, False);
ScaleRichView.RichviewEdit.Format;

Then it works ok.

But If I do this:

Stream := TFileStream.Create(FileName, fmOpenRead);
SetLength(s, Stream.Size);
Stream.ReadBuffer(PChar(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
Stream.Free;
ScaleRichView.RichviewEdit.Format;

Then I get the text as you see in the picture. This code is from TCustomRichView.LoadTextW >> TCustomRVData.LoadTextW >> TCustomRVData.LoadTextFromStreamW.

It doesn't matter if it's a line or an entire text and it doesn't matter if I use UTF8Decode or not.
Maybe I'm doing something wrong - but what is it?
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

If file can be loaded by LoadTextW, this is not a UTF-8, but UTF-16 file (each character = 2 bytes).
Your code is not correct. You load file in string, so each Unicode character is read in two adjacent characters. When you pass this string to WideString parameter of TRichView.AddTextNLW, the string is converted to WideString implicitly, that makes no sense if the string contains data like this.

Why the similar code works in TCustomRVData? Because TCustomRVData.AddTextNLW is different from TRichView.AddTextNLW and intended for private use. While TRichView.AddTextNLW expects WideString parameter, TCustomRVData.AddTextNLW expects String containg data like yours (each Unicode character in two adjacent string characters).
The correct code:

Code: Select all

s: WideString;
Stream := TFileStream.Create(FileName, fmOpenRead);
if Stream.Size mod 2 = 1 then
  !!! error, the file is not Unicode UTF-16 !!!
else begin 
  SetLength(s, Stream.Size div 2); 
  Stream.ReadBuffer(Pointer(s)^, Stream.Size); 
  ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False); 
end;
Stream.Free; 
ScaleRichView.RichviewEdit.Format;
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

I understand, thank you very much for your help.
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

Just one small problem.
For example I have item "Hello world!" (index 0) which IsFromNewLine returns True.
I insert a special character with Insert >> Symbol.
Now I have 3 items:
Item[0] = 'Hello'
Item[1] = special character
Item[2] = ' world!'
I understand that IsFromNewLine(2) = True (that's normal) but why IsFromNewLine(0) returns also True? Strange is that the new line is not visible and when I save the text with "Save.." it's not saved also.
I ask because I don't save the text with "Save...", instead I get text from each item with GetTextA/W and I add #13#10 if IsFromNewLine returns True. In this case Item[1] is on a new line...
What can I do...?
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

IsFromNewLine(0) is always true (because any document starts from a new line). If you use your own function getting text from RichView, do not add #13#10 for the 0th item.
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

Thank you but it doesn't work well if I don't insert the character.
If I have the text on HDD:

Hello World!
How are you?

Item[0] = 'Hello World!' and IsFromNewLine(0) = True.
If I save now like you said the text becomes:

Hello World!How are you?
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

If there are two paragraph, each containing 1 item, both IsFromNewLine(0) and IsFromNewLine(1) are True.

Code: Select all

text := ''; 
for i := 0 to Editor.ItemCount-1 do 
begin 
  if (i>0) and Editor.IsFromNewLine(i) then 
    text := text + #13#10; 
  if Editor.GetItemStyle(i)=rvsTab then 
    text := text + #9 
  else if Editor.GetItemStyle(i)>=0 then 
    text := text + Editor.GetItemTextA(i); 
end;
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

Seems to be working. Thank you.
Cosmin3
Posts: 54
Joined: Sat Apr 05, 2008 12:04 pm

Post by Cosmin3 »

Sorry to bother you again but I met a text where I can't use your code.
Looks like that:

Item[0] = 'LETHAL WEAPON 4.' IsFromNewLine = True
Item[1] = 'Riggs, are you ...' IsFromNewLine = True

I insert a character in first item Now it's like this:

Item[0] = 'LETHAL' IsFromNewLine = True
Item[1] = character IsFromNewLine = False
Item[2] = ' WEAPON 4.' IsFromNewLine = False
Item[3] = 'Riggs, are you...' IsFromNewLine = True

The problem is that IsFromNewLine(2) switched from True to False.
And it's not happening only to the first line from text. Everywhere I break an item (which is a line) into three the first piece has isFromNewLine True and the last has False.
If I convert the text to rtf, I load the file with LoadRtf and I test the items before and after I insert he character then it's the same thing.

PS: it's a normal text, nothing special about it but if you want I will send it to you.
Post Reply