Aware Of Duplicated Partially Overlapped Regions In A Bed File When Using Bedtools Intersect Function
- 1 minAware of duplicated (partially overlapped) regions in a BED file when using bedtools intersect function
When we compare two genomic features, the very common task is to assess the difference in their common region. bedtools intersect is the mostly applied to extract the common region between two BED files. However, there will be unexpected common regions appearing if one or two of our input files contain overlapped region itself. Let’s see some examples:
with open('A.bed','w') as f:
f.write('\n'.join([
'\t'.join(['chr1','10','20']),
'\t'.join(['chr1','15','20']),
'\t'.join(['chr1','30','40']),
]))
with open('B.bed','w') as f:
f.write('\n'.join([
'\t'.join(['chr1','15','20']),
]))
!bedtools intersect -a A.bed -b B.bed
chr1 15 20
chr1 15 20
We count the overlapped region twice!!!
To avoid such redundant regions, we could using bedtools merge
ahead of conducting intersection. For example:
!bedtools merge -i A.bed > A.merge.bed
cat A.merge.bed
chr1 10 20
chr1 30 40
!bedtools intersect -a A.merge.bed -b B.bed
chr1 15 20
Problem sovled!