bowtie2

最后发布时间 : 2025-05-03 12:48:11 浏览量 :

基本命令

bowtie2-build ref1.fa bowtie2/ref1.fa
bowtie2 -x ref1.fa -U reads_1.fq
bowtie2 -x  bowtie2/ref1.fa -1 reads_1.fq -2 reads_2.fq 
docker run -it  --rm -v $PWD:$PWD -w $PWD  --user 1011:1011 registry.cn-hangzhou.aliyuncs.com/sj-bioinfo/bowtie2:2.5.1  bowtie2-build GRCh38.p14.genome.fa GRCh38.p14.genome.fa 

关键的参数

Presets:

  • For --end-to-end:
    • --very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
    • --fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
    • --sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default)
    • --very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
  • For --local:
    • --very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
    • --fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
    • --sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default)
    • --very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

Alignment:

  • -N: max # mismatches in seed alignment; can be 0 or 1 (0)
  • -L: length of seed substrings; must be >3, < 32 (22)
  • -i: interval between seed substrings w/r/t read len (S,1,1.15)

Effort:

  • -D: give up extending after failed extends in a row (15)
  • -R: for reads w/ repetitive seeds, try sets of seeds (2)

  • Bowtie 2
    • Bowtie 2 is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
    • Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB.
    • Bowtie 2 supports gapped, local, and paired-end alignment modes.
63778510 reads; of these:
  63778510 (100.00%) were paired; of these:
    5444376 (8.54%) aligned concordantly 0 times
    48955563 (76.76%) aligned concordantly exactly 1 time
    9378571 (14.70%) aligned concordantly >1 times
    ----
    5444376 pairs aligned concordantly 0 times; of these:
      821305 (15.09%) aligned discordantly 1 time
    ----
    4623071 pairs aligned 0 times concordantly or discordantly; of these:
      9246142 mates make up the pairs; of these:
        8363217 (90.45%) aligned 0 times
        399654 (4.32%) aligned exactly 1 time
        483271 (5.23%) aligned >1 times
93.44% overall alignment rate
((48955563 + 9378571 + 821305)*2 +  399654+ 483271)/127557020 = 0.9344354626660297

(63778510*2)-((48955563 + 9378571 + 821305)*2 +  399654+ 483271) = 8363217

以下是对 Bowtie2 比对结果的详细解读:

一、总体统计‌

  • 总 reads 数量‌
    63,778,510 reads:表示输入的双端测序数据共包含 ‌6,377.85 万对 reads‌(每对含两条 reads)。

  • 比对率‌
    93.44% overall alignment rate:表示 ‌93.44% 的 reads 至少有一条比对到参考基因组‌(包括单端比对和双端比对)。

二、双端比对分类‌

  1. 一致性比对(Concordant Alignment)‌
  • 未成功比对‌
    5,444,376 (8.54%) aligned concordantly 0 times:
    有 ‌544.44 万对 reads‌(8.54%)的双端均未按预期方向和距离比对到基因组。

  • 唯一比对‌
    48,955,563 (76.76%) aligned concordantly exactly 1 time:
    4,895.56 万对 reads‌(76.76%)的双端均唯一比对到基因组,符合预期参数(如 --fr 方向)。

  • 多重比对‌
    9,378,571 (14.70%) aligned concordantly >1 times:
    937.86 万对 reads‌(14.70%)的双端比对到基因组多个位置。

  1. 不一致比对(Discordant Alignment)‌
  • 未一致性比对中的部分成功‌
    821,305 (15.09%) aligned discordantly 1 time:
    在未一致性比对的 544.44 万对 reads 中,‌82.13 万对‌(15.09%)虽未满足一致性比对条件,但双端仍各自比对到基因组(如方向或间距不符)。
    三、单端比对统计‌

  • 完全未比对‌
    4,623,071 pairs aligned 0 times:
    剩余 ‌462.31 万对 reads‌ 既未通过一致性比对,也未通过不一致比对。

  • 单端比对成功率‌

  • 8,363,217 (90.45%) aligned 0 times:
    836.32 万条单端 reads‌(占总 mates 的 90.45%)未比对到基因组。

  • 399,654 (4.32%) aligned exactly 1 time:
    39.97 万条单端 reads‌(4.32%)唯一比对到基因组。

  • 483,271 (5.23%) aligned >1 times:
    48.33 万条单端 reads‌(5.23%)比对到多个位置。

四、关键指标解读‌

类别数值意义
双端一致性比对率‌76.76% +14.70%91.46% 的双端 reads 至少一条比对成功,符合实验预期
单端有效比对贡献‌4.32% +5.23%单端比对贡献了约 9.55% 的总比对率(需结合参数设置判断是否保留)
整体数据利用率‌93.44%适用于大多数下游分析(如 RNA-seq、ChIP-seq),若需更高精度可优化比对参数

五、建议‌
参数优化‌:若需减少多重比对,可调整 -k 或 -m 参数限制比对次数;
质量控制‌:检查未比对 reads 是否包含接头污染或低质量序列;
参考基因组验证‌:确认参考基因组版本与测序数据来源一致。