So my PI has given me the task of comparing all the genes in two different strains of Streptococcus pneumoniae (Taiwan 19F and TIGR4)
For example, he has created a similar list for TIGR4 and D39 matching all the homologous genes (SP_0001 is homologous to SPD_0045, for example).
When he showed me this, he is actually looking up each gene, taking the sequence and blasting it against the other genome. I feel like this method will take me eons and so was wondering how I blast all the genes from TIGR4 against the genome of Taiwan 19F.
I planned on downloading the FASTA files for all the genes for TIGR4, creating one file, and then uploading this and blasting it as a whole. With this I was wondering how I can best download all the gene sequences for this strain. Is this on the FTP site? Would all the genes be in the .ffn download? If so, I was able to blast this entire file against the genome, however, the way we want to identify/match genes is with the locus tag identifier and this is not one of the identifiers in this download (the .ffn).
Any advice on how best to do this would be greatly appreciated!!!!!!