4.27. How do I change all paragraphs to long lines?
A frequent request is how to convert DOS-style textfiles, in which
each line ends with "paragraph marker", to Microsoft-style
textfiles, in which the "paragraph" marker only appears at the end
of real paragraphs. Sometimes this question is framed as, "How do I
remove the hard returns at the end of each line in a paragraph?"
The problem occurs because newer word processors don't work the
same way older text editors did. Older text editors used a newline
(CR/LF in DOS; LF alone in Unix) to end each line on screen or on
disk, and used two newlines to separate paragraphs. Certain word
processors wanted to make paragraph reformatting and reflowing work
easily, so they use one newline to end a paragraph and never allow
newlines within a paragraph. This means that textfiles created
with standard editors (Emacs, vi, Vedit, Boxer, etc.) appear to
have "hard returns" at inappropriate places. The following sed
script finds blocks of consecutive nonblank lines (i.e., paragraphs
of text), and converts each block into one long line with one "hard
return" at the end.
# sed script to change all paragraphs to long lines
/./{H; $!d;} # Put each paragraph into hold space
x; # Swap hold space and pattern space
s/^\(\n\)\(..*\)$/\2\1/; # Move leading \n to end of PatSpace
s/\n\(.\)/ \1/g; # Replace all other \n with 1 space
# Uncomment the following line to remove excess blank lines:
# /./!d;
#---end of sed script---
If the input files have formatting or indentation that conveys
special meaning (like program source code), this script will remove
it. But if the text still needs to be extended, try 'par'
(paragraph reformatter) or the 'fmt' utility with the -t or -c
switches and the width option (-w) set to a number like 9999.