Yinette's Webshite

A collection of security stuff and all sorts of other random shit.

Word 2010's Bizzare Take on Urlencodes and How to Fix It in Nginx

I came across a really odd corner case in a customer ticket today, I was unable to find anything related to this problem that involved rewrites, so here it is! My first real kinda non-infosec post. Shoutout to all sysadmins and ops in the world, the struggle is real! <3

The Problem

In Microsoft Word 2010, URLs that have been pasted into a document will hyperlink, however due to a reason I cannot find any reasonable explanation this is what happens:

Example URL:

http://site.com/content#subcontent

What Word passes to the default browser:

http://site.com/content%20-%20subcontent

Yay.

Basically # gets turned into %20-%20.

A desperate search on the intertubes revealed you can actually implement a registry hack to fix this. In this case, that is simply not possible. Word documents with macros that change registry values = malware, idk what it does, it’s doing the WRONG thing.

The Solution

After a good hour making some of the most monstrous regular expressions I think I’ve ever made, I finally started getting somewhere.

rewrite ^(.*)\ -\ (.*)$ $1#$2 redirect;

In the end this rewrite rule was born.

Note! – Nginx will automatically translate the %20 to a space before it hits the rewrite block.

Here it is in action!

curl -v 'http://localhost/index.html%20-%20thingy'

2015/08/26 13:26:58 [notice] 17521#0: *67 "^(.*)\ -\ (.*)$" matches "/index.html - thingy", client: 127.0.0.1, server: localhost, request: "GET /index.html%20-%20thingy HTTP/1.1", host: "localhost"

< HTTP/1.1 302 Moved Temporarily`
< Location: http://localhost/index.html#thingy`

Yay!