Issue 126946 - URL Recognition Failures
Summary: URL Recognition Failures
Status: CONFIRMED
Alias: None
Product: Impress
Classification: Application
Component: ui (show other issues)
Version: 4.1.1
Hardware: All All
: P5 (lowest) Minor (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-30 18:30 UTC by Lalith Ramesh
Modified: 2016-05-01 19:55 UTC (History)
1 user (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: 4.1.2
Developer Difficulty: Research


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Lalith Ramesh 2016-04-30 18:30:37 UTC
Overview:
The URL recognition code doesn't recognize all URL formats.  URLs of the form www.oracle.com are automatically recognized and a hyperlink is created.  URLs of the form java.oracle.com are not.

Steps to Reproduce:
1)  Start Impress.
2)  Enter the URLs www.oracle.com and java.oracle.com, each on a separate line.

Actual Results:
Only www.oracle.com is underlined and has a hyperlink.

Expected Results:
Both URLs should be underlined and have hyperlinks.

Build Date & Hardware:
Build 2014-08-13 on Intel Core 2 Duo

Additional Builds and Platforms:
N/A

Additional Information:
N/A
Comment 1 orcmid 2016-05-01 01:44:36 UTC
(In reply to Lalith Ramesh from comment #0)
> form www.oracle.com are automatically recognized and a hyperlink is created.
> URLs of the form java.oracle.com are not.
> Both URLs should be underlined and have hyperlinks.

It is interesting that my email reader does the same thing with the plaintext email I received from the Bugzilla system when it sent me the text of this newly-opened issue.

Here's what I find interesting.  *Neither* of those forms are in official URL format.  There is a standard for URLs that neither satisfy.  They both look like domain names but need not be such, and need not be accessible as web sites.

Technically, turning www.oracle.com into a URL (apparently because of the www) is a hack introduced by programming as some idea of what is the likely user intent.  These days, it might be the case that https://www.oracle.com is actually required (if not for that domain, some other www.*.* one).  

To be consistent, neither should be automatically turned into links.  There are ways to make links and that would involve entering a correct URL for whatever one wants the given text to become a link for.

There are similar hacks around forms such as root@bz.apache.org where some software hacks will turn it into the URL form mailto:root@bz.apache.org when that is not what is intended.  Then it is necessary to convince the software that does that to remove the introduction of a link.

The problem is that different users have different expected behaviors.  And it can be very difficult to discourage software from automatically infering a link when it is not desired.  

I think we need to consider 

 1. Leaving Impress as it is, so long as there is a way to force the link to be removed.

 2. Removing the automatic link creation so that all link creation is by intentional use of the mechanism already available for that.

Any scheme for making more of them automatically will always miss some and introduce, for some users, unpleasant false positives.

Thoughts?
Comment 2 Lalith Ramesh 2016-05-01 18:09:04 UTC
(In reply to orcmid from comment #1)
> 
> I think we need to consider 
> 
>  1. Leaving Impress as it is, so long as there is a way to force the link to
> be removed.
> 
>  2. Removing the automatic link creation so that all link creation is by
> intentional use of the mechanism already available for that.
> 
> Any scheme for making more of them automatically will always miss some and
> introduce, for some users, unpleasant false positives.
> 
> Thoughts?

Manual link creation was also suggested as a workaround on the <a href="https://forum.openoffice.org/en/forum/viewtopic.php?f=10&t=82983#p384616">User Support forum</a>.  BTW, I'm using Gmail, and it's smart enough to recognize both URLs.
Comment 3 Lalith Ramesh 2016-05-01 18:12:04 UTC
(In reply to Lalith Ramesh from comment #2)
> (In reply to orcmid from comment #1)
> > 
> > I think we need to consider 
> > 
> >  1. Leaving Impress as it is, so long as there is a way to force the link to
> > be removed.
> > 
> >  2. Removing the automatic link creation so that all link creation is by
> > intentional use of the mechanism already available for that.
> > 
> > Any scheme for making more of them automatically will always miss some and
> > introduce, for some users, unpleasant false positives.
> > 
> > Thoughts?
> 
> Manual link creation was also suggested as a workaround on the <a
> href="https://forum.openoffice.org/en/forum/viewtopic.
> php?f=10&t=82983#p384616">User Support forum</a>.  BTW, I'm using Gmail, and
> it's smart enough to recognize both URLs.

My apologies for the link formatting issues.  Here's the User Support forum link:

https://forum.openoffice.org/en/forum/viewtopic.php?f=10&t=82983#p384616
Comment 4 orcmid 2016-05-01 19:48:06 UTC
(In reply to Lalith Ramesh from comment #3)
> > Manual link creation was also suggested as a workaround on the <a
> > href="https://forum.openoffice.org/en/forum/viewtopic.
> > php?f=10&t=82983#p384616">User Support forum</a>.  BTW, I'm using Gmail, and
> > it's smart enough to recognize both URLs.

> https://forum.openoffice.org/en/forum/viewtopic.php?f=10&t=82983#p384616

Thanks for that link.

I nosed around a little further and determined that automatic link creation is an AutoCorrect feature, controlled at Tools > AutoCorrect Options > Options > URL Recognition.  It is on by default.  

This applies across all components.  I confirm the described behavior in Writer and Calc, not just Impress.

The reliance on the subdomain "www." appears to be by design and is consistent with how web sites have been promoted in the past.

There is no "correct" approach here, so it is not a technical defect.  If the AutoCorrect recognition of text that should be treated as links to URLs based on the text is to be taken farther, that calls for enhancement of this provision.

I have changed the Issue Type, The Hardware and OS, and the Importance.  

The Developer Difficulty is set to Research because we have the serious problem of what the algorithm needs to be and how to prevent it producing too many false positives.  It may be obvious when a human sees one of these that, in context, it is known that a link is desired and what the actual URL needs to be.  But creating more false positives that makes casual users (know how to) turn them off is an issue to be considered.  Also, creating un-noticed false positives during authoring is not a great thing.

How can we resolve this?

Also, note that even if we come up with a reasonable improvement to the AutoCorrect case, no action will be taken until a developer determines this is something that is worth their voluntary attention.
Comment 5 orcmid 2016-05-01 19:55:45 UTC
(In reply to orcmid from comment #4)
> (In reply to Lalith Ramesh from comment #3)
> > > Manual link creation was also suggested as a workaround on the <a
> > > href="https://forum.openoffice.org/en/forum/viewtopic.
> > > php?f=10&t=82983#p384616">User Support forum</a>.  BTW, I'm using Gmail, and
> > > it's smart enough to recognize both URLs.
> 
> > https://forum.openoffice.org/en/forum/viewtopic.php?f=10&t=82983#p384616
> 
> Thanks for that link.
> 
> I nosed around a little further and determined that automatic link creation
> is an AutoCorrect feature, controlled at Tools > AutoCorrect Options >
> Options > URL Recognition.  It is on by default.  
> 
> This applies across all components.  I confirm the described behavior in
> Writer and Calc, not just Impress.
> 
> The reliance on the subdomain "www." appears to be by design and is
> consistent with how web sites have been promoted in the past.
[ ... ]

I just tested orcmid@apache.org and see that it is automatically corrected to a link with URI mailto:orcmid.apache.org and I can revoke that correction by using the context-menu (right-click) Remove Hyperlink selection.  Then the shows up in the spelling checker [;<).