Back to Jeff Huang's Main Page


Some tips for publishing better-looking papers


Publishing Quality Images

There's a bit of an art for including images in publication materials so they look their best. I searched online but only found shallow articles describing the many file types, and not a guide for how to use them. I hope these notes can lead to fewer charts with compression artifacts, vector graphics that have sadly been saved as pixels, and 10 megabyte pdf files. I've tried to write about the typical scenarios, but there are obviously some exceptions.

There are three different image types, and each one needs a different treatment. There are photographs and digital graphics; within digital graphics, there are pixel-based (raster) graphics and vector graphics. One type should never be converted to another type, as they are best stored in their native format.

Photographs are typically represented as jpeg files. These files are delicate. Every time they are saved, they degrade in quality. A photograph can never be made higher quailty, but any change to them makes it lower quality. Cropping them, editing them even a bit, or compressing them down are all actions that reduce their quality. Often even opening them and saving them elsewhere will degrade its quality. A low quality jpeg is easy to notice because it has compression artifacts that look like tiny worms near the sharp contrasts in the image. This is because jpeg images are lossy, which means they are approximated with equations rather than having every pixel described in the file. If you are using a photograph in a publication, then always apply all your changes to the original file once, and then save it into the final file. Do not make some edits, save, make more edits, and save again because this means the quality has been reduced twice.

Raster graphics are digitally created, and are pixel-based. For example, this can include screenshots, pixel art, and interface mockups. The best image format for raster graphics is png, which compress better than gif and have other technical advantages. When using raster graphics, be mindful of the transparency pixel, which is one color in the palette that becomes transparent when overlaid over something else. Always compress your png files before inserting into the final document, because they can often reduce in size by 30% or so. There are even online png compression programs out there, so there's really no excuse not to use the smallest possible files in your document.

Vector graphics are the trickiest to deal with but are also the nicest representation of an image. They are images composed of lines and shapes. Imagine vector graphics as a specification with instructions like "draw a line from the left of the image 2/3s from the bottom to the top right." They will look the same no matter what the resolution, or zoom level, or when printed. It is also a more concise way of describing an image compared to raster graphics, which specifies each pixel. The most common mistake is saving charts as raster graphics instead of vector graphics. Charts, diagrams, logos, wireframes, calendars, and tables should be vector graphics. For example, a chart made in Excel should not be screenshotted or saved as png. Instead, it should be exported or printed to pdf. Always save directly from the application to a vector graphic file type. The most common file types for vector graphics are pdf, svg, and eps. If the application doesn't have an obvious way to export to one of those file type, then try printing to pdf. It's fine to convert between vector graphic file types. One common mistake with vector graphics is ignoring the canvas (artboard), which produces whitespace around the image. Open the file in something like Illustrator, use the white arrow to find and highlight invisible bits in that white space and delete them, then trim the artboard so that the image fills the artboard exactly.

Sometimes, you will find that an application you are using can only save or use one type of image, but you have another type. For example, Microsoft Powerpoint on Windows has difficulty loading in any vector graphic. Or some online drawing tools don't allow exporting vector graphics. When this happens, it's pointless to blame the application but you can still find another application that handles it better. For example, Microsoft Powerpoint on Mac handles loading pdf images just file, which can be part of a slide that is exported as pdf again.


Fixing the typography in Microsoft Word

Word generally produces worse typography compared to a typesetting program like LaTeX. I typically write papers in whatever format my co-authors are most familiar with, which is sometime Word, and have found some tricks to make the typography closer to LaTeX.

Typographic features

The Microsoft Word defaults don't seem to enable some key typography features, maybe to be compatible with older documents, but fortunately can be fixed quite easily.

  1. Kerning: Kerning adjusts the whitespace between characters in the text. For example, there is a large gap between "T" and "o" when they are next to each other; enabling kerning moves the "o" closer to the "T" and lessens the gap. Word has had this feature for a while but it has to be enabled manually; select all the text in your document, go to Font → Advanced, check the Kerning for fonts checkbox, and put "1" for Points and above. One side benefit of enabling kerning is you often free up a couple of lines in your paper so you can add a few more sentences.
  2. Ligatures: Word 2010 and up supports ligatures for most fonts. Ligatures squeeze two characters together when appropriate. For example, "f" and "i" placed next to each other don't look right because the hood of the "f" almost touches the dot of the "i". I feel ligatures are less important than kerning but I enable them anyways; set Ligatures to Standard only in the same place where you enabled kerning.
  3. Hyphenation: Most conferences and journals require you to Justify the text which aligns the text to both margins. However, this will occasionally produce lines with a lot of spacing between words, especially in documents with 2 columns, making the text look sparse Enabling hyphenation allows Word to segment words using a hyphen, eliminating the worst cases of bad word spacing. In the Word toolbar (Ribbon), set Page Layout → Hyphenation to Automatic. If you prefer less or more hyphens, you can adjust when they kick in under Hyphenation Options.
  4. Punctuation: There are few things that make the punctuation in Word look a little nicer.
    1. When writing page numbers in your references, many people use a hyphen, e.g. 179-188. The correct symbol should be an en dash, e.g. 179–188. You can find the en dash under Insert → Symbol → Special Characters.
    2. Word automatically converts all quotes into directional (smart) quotes; this is incorrect for abbreviated years, e.g. '08 for 2008, which should use an apostrophe (a regular single quote) rather than a directional quote.
    3. Instead of putting a bunch of spaces to force line-breaks in your centered titles, use Shift+Enter instead.
    4. Check for accidental double spaces after periods; before I submit, I always search for instances of two spaces "  " and reduce them to one " ".

Citations

When using the ACM or IEEE citation format, e.g. [23] or [6,11,32], you probably don't want to re-number every citation whenever you insert a new reference. LaTeX has a nice BibTeX system for handling this automatically, but you can get similar functionality using Word's cross-references.

  1. Cross-references: To cite a reference, go to Insert → Cross-reference in the Word toolbar (Ribbon). Make sure the Reference type is "Numbered item" and Insert reference to is "Paragraph number" and find your reference. It should insert something like [23] which is linked to the actual reference. When you update your list of references, your citations are updated automatically when you select all and press F9.
  2. Multiple cross-references: Multiple citations show up as [6][11][32] by default, which is not the correct format for ACM and IEEE. To fix this, right-click on the citation number and Toggle Field Codes and add "\# 0" after the reference, e.g. "REF _Ref261299636 \r \h \* MERGEFORMAT" becomes "REF _Ref261299636 \# 0 \r \h \* MERGEFORMAT". This removes the brackets around the citation number, and you can add your own brackets and commas to make it look like [6,11,32] while maintaining the reference link.

Distilling to pdf

Making the pdf for submission is an important step since the final pdf is the only thing that will be seen and archived. The objective here is to make the smallest but best-looking pdf as possible.

  1. Unused fonts: Many Word documents accidentally use a couple of extra fonts in one or two instances, maybe from a copy+paste. You will probably not need more than 2 fonts per document (a serif like Times New Roman and sans-serif like Arial). Use this script to generate a list of fonts used in the document and if you see some unfamiliar font, find where it's being used and replace it. Not only does this cut down on the final pdf file size, but also reduces font dependencies.
  2. Distill: The pdf will be different depending on the program used to make it. I always compare the pdf distilled using 3 different methods and keep the nicest one: (1) Save As PDF in Word (2) Print to Adobe PDF Printer (3) Save to file in Word using any printer, then use the Ghostscript ps2pdf tool to convert it to pdf. I usually get better results by printing to an Adobe PDF Printer.

Back to Jeff Huang