How can I sort -unique every other line in Linux?

How can I sort -unique every other line in Linux?

I have a FASTA file and I want to get rid of redundancies only for the sequence (even number of lines) not the header.

>headerX **SEQUENCE1** >headerY SEQUENCE2 >headerZ **SEQUENCE1**

I want to get rid of the of the identical sequence (SEQUENCE1)

Your question is very hard for mortals to understand. Could you try to make it clearer to non-FASTA, bio-humans please? Your title mentions "every other line" - are you showing us the line that needs changing or the other one? Or is it a trick?
– Mark Setchell
yesterday

It's not clear what output you expect. Would you drop both lines >headerZ and **SEQUENCE1** because sequence1 is already under >headerX ? --or-- would you keep >headerZ, with no line under it? Please provide a sample output for the sample input.
– Stephen P
yesterday

>headerZ

**SEQUENCE1**

>headerX

>headerZ

Please avoid "Give me the codez" questions. Instead show the script you are working on and state where the problem is. Also see How much research effort is expected of Stack Overflow users?
– jww
yesterday

1 Answer
1

You can use 'sed' for this,

sed -n 2~2p data.fasta | sort -u

This will print all of the even numbered lines in data.fasta and then sort the result to remove duplicates.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Xuykyuu