bwa

最后发布时间:2023-04-01 17:08:19 浏览量:

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.

  • BWA-backtrack: The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp.
  • BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.
  • github

For 70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences, BWA-MEM is usually the preferred algorithm.
For short sequences, BWA-backtrack may be better.
BWA-SW may have better sensitivity when alignment gaps are frequent.

论文












# bowtie build index
bowtie-build ref1.fa ref1.fa

# bowtie  mapping
bowtie ref1.fa  test_seq.fq  -S

# bowtie2 build index
bowtie2-build ref1.fa ref1.fa

# bowtie2 mapping
bowtie2 -x ref1.fa -U test_seq.fq

# bowtie2 mapping with local
bowtie2 -x ref1.fa -U test_seq.fq --local

# bwa mem build index
bwa build ref1.fa ref1.fa

# bwa mem mapping
bwa mem ref1.fa test_seq.fq

# bwa mem mapping [change seed length]
bwa mem -k 10 ref1.fa test_seq.fq
@seq1 match
TGAAGCCAAAGAACAAGATGCGCTAGTGGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJ
@seq2 1gap 26G
TGAAGCCAAAGAACAAGATGCGCTATGGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJI
@seq3 2gap 26GT
TGAAGCCAAAGAACAAGATGCGCTAGGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJ
@seq4 2gap 26GTG
TGAAGCCAAAGAACAAGATGCGCTAGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJ
@seq5 3extra TCGATG
TGAAGCCAAAGAACAAGATGCGCTAGTGGACAGATTGCTGACCAGGGGCTTGAGAGCTGTCGATG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJJJIJIJ
@seq6 5extra TCGATG
TCGATGTGAAGCCAAAGAACAAGATGCGCTAGTGGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJJJIJIJ
@seq7 gap 26GT 40GA
TGAAGCCAAAGAACAAGATGCGCTAGGACAGATTGCTCCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJI
@seq8 26extra TCGATG 40extra TCGATG
TGAAGCCAAAGAACAAGATGCGCTATCGATGGTGGACAGATTGCTTCGATGGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJJJIJIJJJJIJI
@seq9 13AtoG 26GtoC 
TGAAGCCAAAGAGCAAGATGCGCTACTGGACAGATTGCTGACCAGGGGCTTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJ
@seq10 13AtoG 26GtoC 39TtoA 50TtoC
TGAAGCCAAAGAGCAAGATGCGCTACTGGACAGATTGCAGACCAGGGGCCTGAGAGCTG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJ
@seq11 5GtoT 13AtoG 26GtoC 39TtoA 50TtoC 58TtoA
TGAATCCAAAGAGCAAGATGCGCTACTGGACAGATTGCAGACCAGGGGCCTGAGAGCAG
+
JIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJJIJJJJIJIJJ
>ref1
AATGATTACGGACCTGAAGCCAAAGAACAAGATGCGCTAGTGGACAGATT
GCTGACCAGGGGCTTGAGAGCTGGGTTCTATTTTCCCTCCTCAAACTGAC
TTTGC
16182999 reads; of these:

16182999 (100.00%) were paired; of these:
5731231 (35.42%) aligned concordantly 0 times
4522376 (27.95%) aligned concordantly exactly 1 time
5929392 (36.64%) aligned concordantly >1 times
----
5731231 pairs aligned concordantly 0 times; of these:
2381431 (41.55%) aligned discordantly 1 time
----
3349800 pairs aligned 0 times concordantly or discordantly; of these:
6699600 mates make up the pairs; of these:
3814736 (56.94%) aligned 0 times
1883429 (28.11%) aligned exactly 1 time
1001435 (14.95%) aligned >1 times
88.21% overall alignment rate


比对结果分为三部分(一般看第一部分和第二部分就够了):

第一部分:是pair-end模式下的一致的比对结果
16182999 (100.00%) were paired:表示总共有这么多对的reads
5731231 (35.42%) aligned concordantly 0 times:表示reads1和reads2不能合理的比对到基因组上
4522376 (27.95%) aligned concordantly exactly 1 time:表示合理比对到基因组上了,reads1和reads2同时比对上并且只有一种比对结果
5929392 (36.64%) aligned concordantly >1 times:表示合理比对到基因组上了,但是reads1和reads2可以同时比对上但是可以比对到多个地方


第二部分:是pair-end模式下不一致的比对结果
5731231 pairs aligned concordantly 0 times; of these:
2381431 (41.55%) aligned discordantly 1 time:表示reads1和reads2都能比对上,但是比对不合理(比如比对方向不对或者插入片段有限)


第三部分:对剩余reads(既不能concordantly,也不能discordantly 1 times)的单端模式的比对
3349800 pairs aligned 0 times concordantly or discordantly; of these: 表示16182999-(4522376+5929392+2381431)=3349800
6699600 mates make up the pairs; of these:表示总共有这么多条reads,3349800*2
3814736 (56.94%) aligned 0 times:表示没有比对上的
1883429 (28.11%) aligned exactly 1 time:表示比对上并且只有一种比对结果的
1001435 (14.95%) aligned >1 times:表示比对上但能比对到多个地方
88.21% overall alignment rate:表示比对率。[4522376*2+5939392*2+2381431*2+1883429+1001435]/(16182999*2)=0.8821