Possible hyperlink bug

General TRichView support forum. Please post your questions here
Post Reply
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Possible hyperlink bug

Post by jgkoehn »

Greetings Sergey,
A coworker found this.
Please load the attached RTF into a compiled demo using the latest RVF. I believe 18.3
Now save it without changing anything and now look at the RTF code. The hyperlink code appears to double for the unicode links.
Now load that one and save again. The hyperlink code appears to double again for the unicode links.
Testing popups for lemma Greek.rtf
(49.97 KiB) Downloaded 1574 times
Sergey Tkachenko
Site Admin
Posts: 17566
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Possible hyperlink bug

Post by Sergey Tkachenko »

Sorry, I cannot reproduce the problem. At least, if I save RTF, open it, and save RTF again, these two new RTF files are identical.
Please tell me what's exactly wrong.

PS: there is one effect, which, I believe, is undesired: a path to RTF file is added to your custom hyperlinks, because the component thinks that these links are local.
There are two options to avoid it:
1) Assign RichView.RTFReadProperties.BasePathLinks := False
2) Or you can assign your own function to RVIsCustomURL variable from RVFileFuncs unit.
It is defined as

Code: Select all

type
  TCustomRVIsURLFunction = function(const Word: TRVUnicodeString): Boolean;

const
  RVIsCustomURL: TCustomRVIsURLFunction = nil;
Assign a function that returns True for strings started from 'tw://'
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Greetings Sergey. Thanks for the tip on the local link. I had turned that off.

When I load it in RVF 18.3 in a demo or tje component and save it to rtf. Then look at the code tje rtd for the unicode libk doubles. It looks fine in the RVF viewer but the underlying code in the rtf itself has the problem.
Sergey Tkachenko
Site Admin
Posts: 17566
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Possible hyperlink bug

Post by Sergey Tkachenko »

What do you mean by "doubled"?
Unicode characters may be duplicated by ANSI characters. This is an optional feature, it can be turned off by excluding rvrtfDuplicateUnicode
from RichView.RTFOptions.
But in any case, RTF must be correct. RTF readers that understand Unicode ignore duplicate ANSI characters, RTF readers that do not understand Unicode ignore Unicode characters.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

I will try to send a screenshot. It doubles then doubles again for each load and save. Thank you for all you do.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Ah you are correct. No bug upon multiple tests. I think it is what you said this rvrtfDuplicateUnicode which is correct.
Thank you for working through this with us.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Ah I see I misunderstood my co-worker on this one.
Here is the actual situation.
Note these two lines:
Hyperlink from MS Word: (Edit after converted by RVF)

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?t=\uc1\u7936 ?\u956 \'b5\u8053 ?\u957 ?\uc0"}{\fldrslt \plain \f6\ul\fs20\cf1 \u7936 }}
Hyperlink we make in our program:

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?\uc1\u225 \'e1\u188 \'bc\u8364 \'80\u206 \'ce\u188 \'bc\u225 \'e1\u189 \'bd\u181 \'b5\u206 \'ce\u189 \'bd\uc0"}{\fldrslt \plain \f6\fs20\cf1 \u7936 }}
For some reason only part of the unicode in the MS Word is coming through we are not sure why. Please note this unicode is polytonic greek. It is like the ansi is not getting fully converted. Is there a setting we have missed?
Last edited by jgkoehn on Sat Mar 28, 2020 9:24 pm, edited 1 time in total.
Sergey Tkachenko
Site Admin
Posts: 17566
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Possible hyperlink bug

Post by Sergey Tkachenko »

It's exactly what I described in my previous reply: for each Unicode character, its non-Unicode alternative is written. These non-Unicode characters are ignored by RTF readers that support Unicode in RTF (i.e. all modern rich text editors).
There are no cumulative duplication, just one Unicode character (\uNNN) followed by one non-Unicode character.
You can see, both TRichView and MS Word write these alternative non-Unicode characters ('?' in MS Word's RTF are these alternative characters as well).

If you exclude rvrtfDuplicateUnicode from RTFOptions, these non-Unicode alternative characters will not be written by TRichView. They are not necessary.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Greetings Sergey,
I think I understand. So when RVF reads a MS Word RTF it brings part of the code in as ?
Here is the same MS Word RTF for that code.

Code: Select all

{\field\fldedit{\*\fldinst {\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \hich\af38\dbch\af11\loch\f38  
\hich\af38\dbch\af11\loch\f38 HYPERLINK "tw://[strong]?t=}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u7936\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 
\f428\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ec}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u8053\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 
\f428\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ed\loch\f428 "}{\rtlch\fcs1 \af38 \ltrch\fcs0 \lang2057\langfe1041\langnp2057\insrsid2912444 \hich\af38\dbch\af11\loch\f38  }{\rtlch\fcs1 \af38 \ltrch\fcs0 
\lang2057\langfe1041\langnp2057\insrsid2912444 {\*\datafield 
00d0c9ea79f9bace118c8200aa004ba90b0200000003000000e0c9ea79f9bace118c8200aa004ba90b42000000740077003a002f002f005b007300740072006f006e0067005d003f0074003d00001fbc03751fbd030000795881f43b1d7f48af2c825dc485276300000000a5ab0003}}}{\fldrslt {\rtlch\fcs1 \af38 
\ltrch\fcs0 \cs53\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u7936\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 \cs53\f428\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af428\dbch\af11\hich\f428 \'ec}{
\rtlch\fcs1 \af38 \ltrch\fcs0 \cs53\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 \loch\af38\dbch\af11\hich\f38 \u8053\'3f}{\rtlch\fcs1 \af428 \ltrch\fcs0 \cs53\f428\ul\cf24\lang2057\langfe1041\langnp2057\insrsid2912444 
\loch\af428\dbch\af11\hich\f428 \'ed}}}
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Hmms,
This does seem to load correctly in other editors, so I'm wondering if the other app we are working with needs changed.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Sorry to take so much of your time Sergey,
I am confused as to why both of these have the same greek characters in a unicode enabled rtf viewer. But the second one has quite a few more characters is this the doubling you mentioned?
I recognize the rtf unicode here:
\u7936 \u956 \u8053 \u957 <This line has much less (Is this just a different unicode format?)
\u225 \u188 \u8364 \u206 \u188 \u225 \u189 \u181 \u206 \u189 <This line has much more. (Is this just a different unicode format?)

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?t=\uc1\u7936 ?\u956 \'b5\u8053 ?\u957 ?\uc0"}{\fldrslt \plain \f6\ul\fs20\cf1 \u7936 }}

Code: Select all

{\field{\*\fldinst HYPERLINK "tw://[strong]?\uc1\u225 \'e1\u188 \'bc\u8364 \'80\u206 \'ce\u188 \'bc\u225 \'e1\u189 \'bd\u181 \'b5\u206 \'ce\u189 \'bd\uc0"}{\fldrslt \plain \f6\fs20\cf1 \u7936 }}
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Also some additional info from a co-worker.
You need to compare the files in a text editor to see the code. Is it normal to make 3 HYPERLINK the same for each Greek word, and the G281 only has one HYPERLINK
(Edit fixed image)
msword-.jpg
msword-.jpg (243.99 KiB) Viewed 40154 times
After saving
aftersave.jpg
aftersave.jpg (204.39 KiB) Viewed 40160 times
Sergey Tkachenko
Site Admin
Posts: 17566
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Possible hyperlink bug

Post by Sergey Tkachenko »

In this document, some characters are loaded as separate hyperlinks. I'll try to optimize it.
Sergey Tkachenko
Site Admin
Posts: 17566
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Possible hyperlink bug

Post by Sergey Tkachenko »

As I said before, in this RTF, Unicode hyperlink is loaded in TRichView as several hypertext items. In TRichView, several hypertext items that have the same target are handled like a single hyperlink. When exporting to DocX, these items are saved as a single hyperlink as well. But when exporting to RTF, each item is saved as a separate hyperlink.

I re-checked this RTF. The link is loaded as several items because different characters in it have different Charsets.
TRichView has an option to ignore Charsets from RTF: assign RichView.RTFReadProperties.UseCharsetForUnicode = True. In this mode, RichView.RTFReadProperties.CharsetForUnicode will be applied to all text loaded from RTF, and this hyperlink will be loaded as a single item.

I modified RTF saving code. In the next update, adjacent hypertext items having the same target will be exported to RTF as a single hyperlink.
jgkoehn
Posts: 303
Joined: Thu Feb 20, 2020 9:32 pm

Re: Possible hyperlink bug

Post by jgkoehn »

Thank you Sir, I can for now use the option you suggested as an option for the user. I look forward to the next update. Thanks for all your work. By the way I work with Jon Graef and Costas Stergiou at theword.net
Post Reply