📊 Paper Summary
| Item | Details |
|---|---|
| Title | The Evolution and Global Spatiotemporal Dynamics of Senecavirus A |
| Journal | Microbiology Spectrum (2022) |
| Key analysis | Recombination, phylogeny, selection, spatiotemporal dynamics |
| Dataset | 249 SVA complete genomes from GenBank (Dec 2021) |
| Key finding | Brazil as source of global transmission since 2015 |
📊 Paper Summary
| Item | Details |
|---|---|
| Title | The Evolution and Global Spatiotemporal Dynamics of Senecavirus A |
| Journal | Microbiology Spectrum (2022) |
| Key analysis | Recombination, phylogeny, selection, spatiotemporal dynamics |
| Dataset | 249 SVA complete genomes from GenBank (Dec 2021) |
| Key finding | Brazil as source of global transmission since 2015 |
🗂️ Our Reproduction Plan (Section by Section)
Let's tackle this one analysis at a time:
| Section | Analysis | Status |
|---|---|---|
| 1 | Data collection (accession numbers) | 🔴 Need supplement |
| 2 | Recombination analysis (RDP4) | 🔴 Pending data |
| 3 | Phylogenetic tree (BI tree, Fig 1) | 🔴 Pending data |
| 4 | Selection analysis (Table 1) | 🟡 Can explain |
| 5 | Evolutionary dynamics (BEAST) | 🟡 Can explain |
| 6 | Phylogeography (Brazil as source) | 🟡 Can explain |
🔴 STEP 1: The Data — We Need Accession Numbers
The paper clearly states:
"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."
Your task: Download the supplementary material
Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22
Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"
Once you have Table S1, paste the accession numbers here. I will then:
Extract all 249 accession numbers
Generate the complete download script
Remove the 11 identical sequences (as they did)
Remove the 5 recombinants (as they did) → final 233 sequences
📥 STEP 2: Download Pipeline (Ready When You Get Accessions)
Once you paste the accessions, use this command:
# Method 1: Batch Entrez (web) # Go to: https://www.ncbi.nlm.nih.gov/sets/batch # Paste accession numbers → Download FASTA # Method 2: Command line (if you have Entrez Direct) cat accessions.txt | while read acc; do esearch -db nucleotide -query $acc | efetch -format fasta done > all_249_sva.fasta
🧬 STEP 3: Alignment (as per paper)
The paper used MACSE v2 (Multiple Alignment of Coding Sequences):
# Install MACSE git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git # Run alignment java -jar MACSE_v2.06.jar -prog alignSequences \ -seq all_249_sva.fasta \ -out_seq aligned_sva.fasta
Alternative (simpler): Use MAFFT
mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta
🌳 STEP 4: Phylogenetic Tree (Figure 1)
The paper used:
Bayesian Inference (BI) with MrBayes
GTR+F+I+G4 substitution model
Posterior probabilities ≥0.99
Simpler reproduction with IQ-TREE:
iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO
For Bayesian tree (BEAST/MrBayes):
# MrBayes block mrbayes execute aligned_sva.nex lset nst=6 rates=invgamma mcmc ngen=10000000 samplefreq=1000
🔬 STEP 5: Selection Analysis (Table 1)
The paper used HyPhy with multiple methods:
# Install HyPhy conda install -c bioconda hyphy # Run BUSTED (episodic selection) hyphy busted --alignment aligned_sva.fasta --tree tree.nwk # Run FEL (fixed effects likelihood) hyphy fel --alignment aligned_sva.fasta --tree tree.nwk # Run FUBAR hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk # Run RELAX (relaxed selection) hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII
They found 27 positive selection sites (Table 1):
RdRp: 7 sites
VP1: 5 sites
3C: 5 sites
2C: 3 sites
VP2: 2 sites
VP4, VP3, 2A, 2B: 1 site each
⏱️ STEP 6: Evolutionary Dynamics (BEAST)
The paper used:
Clock model: Uncorrelated relaxed log-normal (UCLN)
Demographic model: Bayesian SkyGrid
Substitution rate: 3.35 × 10⁻³ subs/site/year
TMRCA: 1986 (95% HPD: 1983-1988)
To reproduce:
# BEAST XML file creation (requires BEAUti) # Then run: beast -beagle sva_analysis.xml
🌍 STEP 7: Phylogeography (Brazil as Source)
The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.
Key result: Brazil root state posterior probability = 0.4811
📋 What You Need To Do Right Now
| Priority | Action |
|---|---|
| 1 | Download Supplemental Table S1 from the journal website |
| 2 | Paste the accession numbers here |
| 3 | Tell me which analysis you want to reproduce first |
✅ What I Will Give You Once You Paste Accessions
✅ Complete list of 249 accession numbers
✅ Download script (bash)
✅ Full alignment command
✅ Phylogenetic tree command (IQ-TREE + MrBayes)
✅ Selection analysis commands (HyPhy)
✅ Methods section for your assignment
✅ Figure reproduction steps (Fig 1, 2, 3, 4, 5, 6, 7, 8)
Go download the supplementary material now. Paste Table S1 here, and we will start the complete reproduction.
🗂️ Our Reproduction Plan (Section by Section)
Let's tackle this one analysis at a time:
| Section | Analysis | Status |
|---|---|---|
| 1 | Data collection (accession numbers) | 🔴 Need supplement |
| 2 | Recombination analysis (RDP4) | 🔴 Pending data |
| 3 | Phylogenetic tree (BI tree, Fig 1) | 🔴 Pending data |
| 4 | Selection analysis (Table 1) | 🟡 Can explain |
| 5 | Evolutionary dynamics (BEAST) | 🟡 Can explain |
| 6 | Phylogeography (Brazil as source) | 🟡 Can explain |
🔴 STEP 1: The Data — We Need Accession Numbers
The paper clearly states:
"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."
Your task: Download the supplementary material
Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22
Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"
Once you have Table S1, paste the accession numbers here. I will then:
Extract all 249 accession numbers
Generate the complete download script
Remove the 11 identical sequences (as they did)
Remove the 5 recombinants (as they did) → final 233 sequences
📥 STEP 2: Download Pipeline (Ready When You Get Accessions)
Once you paste the accessions, use this command:
# Method 1: Batch Entrez (web) # Go to: https://www.ncbi.nlm.nih.gov/sets/batch # Paste accession numbers → Download FASTA # Method 2: Command line (if you have Entrez Direct) cat accessions.txt | while read acc; do esearch -db nucleotide -query $acc | efetch -format fasta done > all_249_sva.fasta
🧬 STEP 3: Alignment (as per paper)
The paper used MACSE v2 (Multiple Alignment of Coding Sequences):
# Install MACSE git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git # Run alignment
(base) suman@SumanPC:~/Senecapaper_2026_april$ java -jar macse_v2.07.jar -prog alignSequences -seq /home/suman/Senecapaper_2026_april/all_233_final.fasta java -jar MACSE_v2.06.jar -prog alignSequences \ -seq all_249_sva.fasta \ -out_seq aligned_sva.fasta
The file '/home/suman/Senecapaper_2026_april/all_233_final_NT.fasta' was created. The file '/home/suman/Senecapaper_2026_april/all_233_final_AA.fasta' was created.
Alternative (simpler): Use MAFFT
mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta
🌳 STEP 4: Phylogenetic Tree (Figure 1)
The paper used:
Bayesian Inference (BI) with MrBayes
GTR+F+I+G4 substitution model
Posterior probabilities ≥0.99
Simpler reproduction with IQ-TREE:
What iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO
What I got after analysis,
Analysis results written to:
IQ-TREE report: aligned_svaMAFFTalternative.fasta.iqtree
Maximum-likelihood tree: aligned_svaMAFFTalternative.fasta.treefile
Likelihood distances: aligned_svaMAFFTalternative.fasta.mldist
Ultrafast bootstrap approximation results written to:
Split support values: aligned_svaMAFFTalternative.fasta.splits.nex
Consensus tree: aligned_svaMAFFTalternative.fasta.contree
Screen log file: aligned_svaMAFFTalternative.fasta.log
For Bayesian tree (BEAST/MrBayes):
# MrBayes block mrbayes execute aligned_sva.nex lset nst=6 rates=invgamma mcmc ngen=10000000 samplefreq=1000
🔬 STEP 5: Selection Analysis (Table 1)
The paper used HyPhy with multiple methods:
# Install HyPhy conda install -c bioconda hyphy # Run BUSTED (episodic selection) hyphy busted --alignment aligned_sva.fasta --tree tree.nwk # Run FEL (fixed effects likelihood) hyphy fel --alignment aligned_sva.fasta --tree tree.nwk # Run FUBAR hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk # Run RELAX (relaxed selection) hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII
They found 27 positive selection sites (Table 1):
RdRp: 7 sites
VP1: 5 sites
3C: 5 sites
2C: 3 sites
VP2: 2 sites
VP4, VP3, 2A, 2B: 1 site each
⏱️ STEP 6: Evolutionary Dynamics (BEAST)
The paper used:
Clock model: Uncorrelated relaxed log-normal (UCLN)
Demographic model: Bayesian SkyGrid
Substitution rate: 3.35 × 10⁻³ subs/site/year
TMRCA: 1986 (95% HPD: 1983-1988)
To reproduce:
# BEAST XML file creation (requires BEAUti) # Then run: beast -beagle sva_analysis.xml
🌍 STEP 7: Phylogeography (Brazil as Source)
The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.
Key result: Brazil root state posterior probability = 0.4811
📋 Extracted Accession Numbers (Page 1 of Table S1)
Here are the accession numbers I can clearly read from the PDF:
| Accession | Strain | Country | Year |
|---|---|---|---|
| DQ641257 | SVV-001 | USA | 2002 |
| KC667560 | 11-55910-3 | Canada | 2011 |
| KR063107 | SVA/BRA/MG1/2015 | Brazil | 2015 |
| KR063108 | SVA/BRA/MG2/2015 | Brazil | 2015 |
| KR063109 | SVA/BRA/GO3/2015 | Brazil | 2015 |
| KT757280 | USA/IA40380/2015 | USA | 2015 |
| KT757281 | USA/SD41901/2015 | USA | 2015 |
| KT757282 | USA/IA46008/2015 | USA | 2015 |
| KT321458 | CH-01-2015 | China | 2015 |
| KU359210 | MN15-84-4 | USA | 2015 |
| KU359211 | MN15-84-8 | USA | 2015 |
| KU359212 | MN15-84-21 | USA | 2015 |
| KU359213 | MN15-84-22 | USA | 2015 |
| KU359214 | MN15-308-M3 | USA | 2015 |
| KU058182 | SVA-OH1 | USA | 2015 |
| KU058183 | SVA-OH2 | USA | 2015 |
| KT827251 | USA/GBI29/2015 | USA | 2015 |
| KU051391 | US-15-39812IA | USA | 2015 |
| KU051392 | US-15-40380IA | USA | 2015 |
| KU051393 | US-15-40381IA | USA | 2015 |
| KU051394 | US-15-41901SD | USA | 2015 |
| KX019804 | KS15-01 | USA | 2015 |
| KX377924 | HB-CH-2016 | China | 2016 |
| KX223836 | USA/IN_Purdue_4885/2015 | USA | 2015 |
| KX778101 | SD15-26 | USA | 2015 |
| KX857728 | Colombia/2016 | Colombia | 2016 |
| KY172968 | SVA-715 | USA | 2014 |
| KU954087 | USA/IA39812/2015_P1 | USA | 2015 |
| KU954088 | USA/IA40381/2015_P1 | USA | 2015 |
| KU954089 | USA/IA44662/2015_P1 | USA | 2015 |
| KU954090 | USA/IA44952/2015_P1 | USA | 2015 |
| KX751943 | CH-DB-11-2015 | China | 2015 |
| KX751944 | CH-DL-01-2016 | China | 2016 |
| KX751945 | CH-LX-01-2016 | China | 2016 |
| KX759146 | CH-ZW-01-2016 | China | 2016 |
| KY038016 | CH/GXI09/2016 | China | 2016 |
| KY486156 | SVA/Canada/MB/NCFAD-104-1/2015 | Canada | 2015 |
| KY486157 | SVA/Canada/MB/NCFAD-104-6/2015 | Canada | 2015 |
| KY486158 | SVA/Canada/MB/NCFAD-104-9/2015 | Canada | 2015 |
| KY486159 | SVA/Canada/MB/NCFAD-108-12/2015 | Canada | 2015 |
| KY486160 | SVA/Canada/MB/NCFAD-108-16/2015 | Canada | 2015 |
📥 Page 5 Accessions (I see more)
From page 5 of the PDF:
| Accession | Strain | Country | Year |
|---|---|---|---|
| MN233024 | USA/MO15-029085/2015 | USA | 2015 |
| MN233025 | USA/KS15-031348/2015 | USA | 2015 |
| MN233026 | USA/NC88-23626/1988 | USA | 1988 |
| MN233027 | USA/MN88-36695/1988 | USA | 1988 |
| MN233028 | USA/89-47552/1989 | USA | 1989 |
| MN233029 | USA/NJ90-10324/1989 | USA | 1989 |
| MN233030 | USA/IA90-23664/1990 | USA | 1990 |
| MN233031 | USA/IL92-48963/1992 | USA | 1992 |
| MN233032 | USA/IL94-9356/1993 | USA | 1993 |
| MN233033 | USA/MN99-29256/1999 | USA | 1999 |
| MN233034 | USA/LA97-98061/1997 | USA | 1997 |
| MK333629 | SVA/USA/MN/004/2015 | USA | 2015 |
| MK333630 | SVA/USA/MN/005/2015 | USA | 2015 |
| MK333631 | SVA/USA/MN/006/2015 | USA | 2015 |
| MK333632 | SVA/USA/MN/007/2015 | USA | 2015 |
| MK333633 | SVA/USA/MN/009/2016 | USA | 2016 |
| MK333634 | SVA/USA/MN/010/2016 | USA | 2016 |
| MK333635 | SVA/USA/MN/011/2016 | USA | 2016 |
| MK333636 | SVA/USA/MN/012/2016 | USA | 2016 |
| MK333637 | SVA/USA/MN/013/2016 | USA | 2016 |
| MN433300 | SDta/2018 | China | 2018 |
| MN164664 | MN_US_2015 | USA | 2015 |
| MN812938 | USA/IA89-47552/1989 | USA | 1989 |
| MN812943 | USA/TN06-00310/2006 | USA | 2006 |
| MN812944 | USA/MN07-00487/2007 | USA | 2007 |
| MN812945 | USA/MN07-00488/2007 | USA | 2007 |
| MN812946 | USA/IA09-34037/2009 | USA | 2009 |
| MN812947 | USA/HI13-007758/2013 | USA | 2013 |
| MN812948 | USA/HI13-019514/2013 | USA | 2013 |
| MN812949 | USA/HI15-021004/2015 | USA | 2015 |
| MN812950 | USA/GA15-022479/2015 | USA | 2015 |
| MN812951 | USA/ND15-029655/2015 | USA | 2015 |
| MN812952 | USA/GA15-187/2015 | USA | 2015 |
| MN812953 | USA/MI15-21/2015 | USA | 2015 |
| MN812954 | USA/IL15-229/2015 | USA | 2015 |
| MN812955 | USA/CA15-52/2015 | USA | 2015 |
| MN812956 | USA/IA15-64/2015 | USA | 2015 |
| MN812957 | USA/MI15-7/2015 | USA | 2015 |
| MN812958 | USA/MI16-038766/2016 | USA | 2016 |
| MN812959 | USA/MI17-011956/2017 | USA | 2017 |
| MN812960 | USA/WI17-014775/2017 | USA | 2017 |
| MN017170 | GDHY/2018 | China | 2018 |
| MN423333 | CH-GDZQ-2018 | China | 2018 |
🚀 Complete Download Strategy
Since the PDF has 249 accession numbers (too many to manually type), here is the most efficient approach:
Method 1: Use NCBI Batch Entrez (Easiest)
Copy all accession numbers from the PDF into a text file
Paste the accession numbers
Click "Retrieve" → Download FASTA
Method 2: Create a text file and use command line
# Create a file with all accessions (one per line) cat > accessions.txt << 'EOF' DQ641257 KC667560 KR063107 KR063108 KR063109 KT757280 KT757281 KT757282 KT321458 KU359210 KU359211 KU359212 KU359213 KU359214 KU058182 KU058183 KT827251 KU051391 KU051392 KU051393 KU051394 KX019804 KX377924 KX223836 KX778101 KX857728 KY172968 KU954087 KU954088 KU954089 KU954090 KX751943 KX751944 KX751945 KX759146 KY038016 KY486156 KY486157 KY486158 KY486159 KY486160 MN233024 MN233025 MN233026 MN233027 MN233028 MN233029 MN233030 MN233031 MN233032 MN233033 MN233034 MK333629 MK333630 MK333631 MK333632 MK333633 MK333634 MK333635 MK333636 MK333637 MN433300 MN164664 MN812938 MN812943 MN812944 MN812945 MN812946 MN812947 MN812948 MN812949 MN812950 MN812951 MN812952 MN812953 MN812954 MN812955 MN812956 MN812957 MN812958 MN812959 MN812960 MN017170 MN423333 EOF # Download all sequences efetch -db nucleotide -input accessions.txt -format fasta > all_249_sva.fasta
🧬 Next Steps After Download
Once you have downloaded all 249 sequences, here is the complete analysis pipeline:
Step 1: Remove identical sequences (11 removed as per paper)
# Use seqkit to remove duplicates seqkit rmdup -s all_249_sva.fasta -o all_238_unique.fasta
Step 2: Remove recombinant sequences (5 strains from Table S2)
# Remove recombinants using seqkit grep seqkit grep -v -p "MH316113|MK357117|MN887249|MK357116|MG765559" \ all_238_unique.fasta > all_233_final.fasta
Step 3: Alignment with MACSE (as paper)
# Download MACSE wget https://github.com/ranwez/MACSE_V2_PIPELINES/releases/download/V2.06/MACSE_v2.06.jar # Run alignment java -jar MACSE_v2.06.jar -prog alignSequences \ -seq all_233_final.fasta \ -out_seq aligned_sva.fasta
Step 4: Phylogenetic tree (Figure 1)
# IQ-TREE for ML tree iqtree -s aligned_sva.fasta -m GTR+F+I+G4 -bb 1000 -nt AUTO # MrBayes for BI tree (as in paper) mb execute aligned_sva.nex lset nst=6 rates=invgamma mcmc ngen=10000000 samplefreq=1000 sump sumt
Step 5: Selection analysis (Table 1)
# HyPhy analysis hyphy busted --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile hyphy fel --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile hyphy fubar --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile
⚠️ What You Need To Do Now
| Priority | Action |
|---|---|
| 1 | Copy ALL accession numbers from the PDF into a text file |
| 2 | Use NCBI Batch Entrez to download |
| 3 | Run the pipeline above |