Step for running the analysis- Taking an example paper from Pubmed

📊 Paper Summary

Item	Details
Title	The Evolution and Global Spatiotemporal Dynamics of Senecavirus A
Journal	Microbiology Spectrum (2022)
Key analysis	Recombination, phylogeny, selection, spatiotemporal dynamics
Dataset	249 SVA complete genomes from GenBank (Dec 2021)
Key finding	Brazil as source of global transmission since 2015

📊 Paper Summary

Item	Details
Title	The Evolution and Global Spatiotemporal Dynamics of Senecavirus A
Journal	Microbiology Spectrum (2022)
Key analysis	Recombination, phylogeny, selection, spatiotemporal dynamics
Dataset	249 SVA complete genomes from GenBank (Dec 2021)
Key finding	Brazil as source of global transmission since 2015

🗂️ Our Reproduction Plan (Section by Section)

Let's tackle this one analysis at a time:

Section	Analysis	Status
1	Data collection (accession numbers)	🔴 Need supplement
2	Recombination analysis (RDP4)	🔴 Pending data
3	Phylogenetic tree (BI tree, Fig 1)	🔴 Pending data
4	Selection analysis (Table 1)	🟡 Can explain
5	Evolutionary dynamics (BEAST)	🟡 Can explain
6	Phylogeography (Brazil as source)	🟡 Can explain

🔴 STEP 1: The Data — We Need Accession Numbers

The paper clearly states:

"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."

Your task: Download the supplementary material

Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22

Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"

Once you have Table S1, paste the accession numbers here. I will then:

Extract all 249 accession numbers
Generate the complete download script
Remove the 11 identical sequences (as they did)
Remove the 5 recombinants (as they did) → final 233 sequences

📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

Once you paste the accessions, use this command:

# Method 1: Batch Entrez (web)
# Go to: https://www.ncbi.nlm.nih.gov/sets/batch
# Paste accession numbers → Download FASTA

# Method 2: Command line (if you have Entrez Direct)
cat accessions.txt | while read acc; do
  esearch -db nucleotide -query $acc | efetch -format fasta
done > all_249_sva.fasta

🧬 STEP 3: Alignment (as per paper)

The paper used MACSE v2 (Multiple Alignment of Coding Sequences):

# Install MACSE
git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git

# Run alignment
java -jar MACSE_v2.06.jar -prog alignSequences \
  -seq all_249_sva.fasta \
  -out_seq aligned_sva.fasta

Alternative (simpler): Use MAFFT

mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta

🌳 STEP 4: Phylogenetic Tree (Figure 1)

The paper used:

Bayesian Inference (BI) with MrBayes
GTR+F+I+G4 substitution model
Posterior probabilities ≥0.99

Simpler reproduction with IQ-TREE:

iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO

For Bayesian tree (BEAST/MrBayes):

# MrBayes block
mrbayes
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000

🔬 STEP 5: Selection Analysis (Table 1)

The paper used HyPhy with multiple methods:

# Install HyPhy
conda install -c bioconda hyphy

# Run BUSTED (episodic selection)
hyphy busted --alignment aligned_sva.fasta --tree tree.nwk

# Run FEL (fixed effects likelihood)
hyphy fel --alignment aligned_sva.fasta --tree tree.nwk

# Run FUBAR
hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk

# Run RELAX (relaxed selection)
hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII

They found 27 positive selection sites (Table 1):

RdRp: 7 sites
VP1: 5 sites
3C: 5 sites
2C: 3 sites
VP2: 2 sites
VP4, VP3, 2A, 2B: 1 site each

⏱️ STEP 6: Evolutionary Dynamics (BEAST)

The paper used:

Clock model: Uncorrelated relaxed log-normal (UCLN)
Demographic model: Bayesian SkyGrid
Substitution rate: 3.35 × 10⁻³ subs/site/year
TMRCA: 1986 (95% HPD: 1983-1988)

To reproduce:

# BEAST XML file creation (requires BEAUti)
# Then run:
beast -beagle sva_analysis.xml

🌍 STEP 7: Phylogeography (Brazil as Source)

The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.

Key result: Brazil root state posterior probability = 0.4811

📋 What You Need To Do Right Now

Priority	Action
1	Download Supplemental Table S1 from the journal website
2	Paste the accession numbers here
3	Tell me which analysis you want to reproduce first

✅ What I Will Give You Once You Paste Accessions

✅ Complete list of 249 accession numbers
✅ Download script (bash)
✅ Full alignment command
✅ Phylogenetic tree command (IQ-TREE + MrBayes)
✅ Selection analysis commands (HyPhy)
✅ Methods section for your assignment
✅ Figure reproduction steps (Fig 1, 2, 3, 4, 5, 6, 7, 8)

Go download the supplementary material now. Paste Table S1 here, and we will start the complete reproduction.

spectrum.02090-22-s0001.pdf

PDF 1.61MB

for table s1, i have pdf.

🗂️ Our Reproduction Plan (Section by Section)

Let's tackle this one analysis at a time:

Section	Analysis	Status
1	Data collection (accession numbers)	🔴 Need supplement
2	Recombination analysis (RDP4)	🔴 Pending data
3	Phylogenetic tree (BI tree, Fig 1)	🔴 Pending data
4	Selection analysis (Table 1)	🟡 Can explain
5	Evolutionary dynamics (BEAST)	🟡 Can explain
6	Phylogeography (Brazil as source)	🟡 Can explain

🔴 STEP 1: The Data — We Need Accession Numbers

The paper clearly states:

"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."

Your task: Download the supplementary material

Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22

Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"

Once you have Table S1, paste the accession numbers here. I will then:

Extract all 249 accession numbers
Generate the complete download script
Remove the 11 identical sequences (as they did)
Remove the 5 recombinants (as they did) → final 233 sequences

📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

Once you paste the accessions, use this command:

# Method 1: Batch Entrez (web)
# Go to: https://www.ncbi.nlm.nih.gov/sets/batch
# Paste accession numbers → Download FASTA

# Method 2: Command line (if you have Entrez Direct)
cat accessions.txt | while read acc; do
  esearch -db nucleotide -query $acc | efetch -format fasta
done > all_249_sva.fasta

🧬 STEP 3: Alignment (as per paper)

The paper used MACSE v2 (Multiple Alignment of Coding Sequences):

# Install MACSE
git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git

# Run alignment
(base) suman@SumanPC:~/Senecapaper_2026_april$ java -jar macse_v2.07.jar -prog alignSequences -seq /home/suman/Senecapaper_2026_april/all_233_final.fasta

java -jar MACSE_v2.06.jar -prog alignSequences \
  -seq all_249_sva.fasta \
  -out_seq aligned_sva.fasta


The file '/home/suman/Senecapaper_2026_april/all_233_final_NT.fasta' was created.
The file '/home/suman/Senecapaper_2026_april/all_233_final_AA.fasta' was created.

Alternative (simpler): Use MAFFT

mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta

🌳 STEP 4: Phylogenetic Tree (Figure 1)

The paper used:

Bayesian Inference (BI) with MrBayes
GTR+F+I+G4 substitution model
Posterior probabilities ≥0.99

Simpler reproduction with IQ-TREE:

What iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO

What I got after analysis,

Analysis results written to:

IQ-TREE report: aligned_svaMAFFTalternative.fasta.iqtree

Maximum-likelihood tree: aligned_svaMAFFTalternative.fasta.treefile

Likelihood distances: aligned_svaMAFFTalternative.fasta.mldist

Ultrafast bootstrap approximation results written to:

Split support values: aligned_svaMAFFTalternative.fasta.splits.nex

Consensus tree: aligned_svaMAFFTalternative.fasta.contree

Screen log file: aligned_svaMAFFTalternative.fasta.log

For Bayesian tree (BEAST/MrBayes):

# MrBayes block
mrbayes
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000

🔬 STEP 5: Selection Analysis (Table 1)

The paper used HyPhy with multiple methods:

# Install HyPhy
conda install -c bioconda hyphy

# Run BUSTED (episodic selection)
hyphy busted --alignment aligned_sva.fasta --tree tree.nwk

# Run FEL (fixed effects likelihood)
hyphy fel --alignment aligned_sva.fasta --tree tree.nwk

# Run FUBAR
hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk

# Run RELAX (relaxed selection)
hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII

They found 27 positive selection sites (Table 1):

RdRp: 7 sites
VP1: 5 sites
3C: 5 sites
2C: 3 sites
VP2: 2 sites
VP4, VP3, 2A, 2B: 1 site each

⏱️ STEP 6: Evolutionary Dynamics (BEAST)

The paper used:

Clock model: Uncorrelated relaxed log-normal (UCLN)
Demographic model: Bayesian SkyGrid
Substitution rate: 3.35 × 10⁻³ subs/site/year
TMRCA: 1986 (95% HPD: 1983-1988)

To reproduce:

# BEAST XML file creation (requires BEAUti)
# Then run:
beast -beagle sva_analysis.xml

🌍 STEP 7: Phylogeography (Brazil as Source)

The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.

Key result: Brazil root state posterior probability = 0.4811

📋 Extracted Accession Numbers (Page 1 of Table S1)

Here are the accession numbers I can clearly read from the PDF:

Accession	Strain	Country	Year
DQ641257	SVV-001	USA	2002
KC667560	11-55910-3	Canada	2011
KR063107	SVA/BRA/MG1/2015	Brazil	2015
KR063108	SVA/BRA/MG2/2015	Brazil	2015
KR063109	SVA/BRA/GO3/2015	Brazil	2015
KT757280	USA/IA40380/2015	USA	2015
KT757281	USA/SD41901/2015	USA	2015
KT757282	USA/IA46008/2015	USA	2015
KT321458	CH-01-2015	China	2015
KU359210	MN15-84-4	USA	2015
KU359211	MN15-84-8	USA	2015
KU359212	MN15-84-21	USA	2015
KU359213	MN15-84-22	USA	2015
KU359214	MN15-308-M3	USA	2015
KU058182	SVA-OH1	USA	2015
KU058183	SVA-OH2	USA	2015
KT827251	USA/GBI29/2015	USA	2015
KU051391	US-15-39812IA	USA	2015
KU051392	US-15-40380IA	USA	2015
KU051393	US-15-40381IA	USA	2015
KU051394	US-15-41901SD	USA	2015
KX019804	KS15-01	USA	2015
KX377924	HB-CH-2016	China	2016
KX223836	USA/IN_Purdue_4885/2015	USA	2015
KX778101	SD15-26	USA	2015
KX857728	Colombia/2016	Colombia	2016
KY172968	SVA-715	USA	2014
KU954087	USA/IA39812/2015_P1	USA	2015
KU954088	USA/IA40381/2015_P1	USA	2015
KU954089	USA/IA44662/2015_P1	USA	2015
KU954090	USA/IA44952/2015_P1	USA	2015
KX751943	CH-DB-11-2015	China	2015
KX751944	CH-DL-01-2016	China	2016
KX751945	CH-LX-01-2016	China	2016
KX759146	CH-ZW-01-2016	China	2016
KY038016	CH/GXI09/2016	China	2016
KY486156	SVA/Canada/MB/NCFAD-104-1/2015	Canada	2015
KY486157	SVA/Canada/MB/NCFAD-104-6/2015	Canada	2015
KY486158	SVA/Canada/MB/NCFAD-104-9/2015	Canada	2015
KY486159	SVA/Canada/MB/NCFAD-108-12/2015	Canada	2015
KY486160	SVA/Canada/MB/NCFAD-108-16/2015	Canada	2015

📥 Page 5 Accessions (I see more)

From page 5 of the PDF:

Accession	Strain	Country	Year
MN233024	USA/MO15-029085/2015	USA	2015
MN233025	USA/KS15-031348/2015	USA	2015
MN233026	USA/NC88-23626/1988	USA	1988
MN233027	USA/MN88-36695/1988	USA	1988
MN233028	USA/89-47552/1989	USA	1989
MN233029	USA/NJ90-10324/1989	USA	1989
MN233030	USA/IA90-23664/1990	USA	1990
MN233031	USA/IL92-48963/1992	USA	1992
MN233032	USA/IL94-9356/1993	USA	1993
MN233033	USA/MN99-29256/1999	USA	1999
MN233034	USA/LA97-98061/1997	USA	1997
MK333629	SVA/USA/MN/004/2015	USA	2015
MK333630	SVA/USA/MN/005/2015	USA	2015
MK333631	SVA/USA/MN/006/2015	USA	2015
MK333632	SVA/USA/MN/007/2015	USA	2015
MK333633	SVA/USA/MN/009/2016	USA	2016
MK333634	SVA/USA/MN/010/2016	USA	2016
MK333635	SVA/USA/MN/011/2016	USA	2016
MK333636	SVA/USA/MN/012/2016	USA	2016
MK333637	SVA/USA/MN/013/2016	USA	2016
MN433300	SDta/2018	China	2018
MN164664	MN_US_2015	USA	2015
MN812938	USA/IA89-47552/1989	USA	1989
MN812943	USA/TN06-00310/2006	USA	2006
MN812944	USA/MN07-00487/2007	USA	2007
MN812945	USA/MN07-00488/2007	USA	2007
MN812946	USA/IA09-34037/2009	USA	2009
MN812947	USA/HI13-007758/2013	USA	2013
MN812948	USA/HI13-019514/2013	USA	2013
MN812949	USA/HI15-021004/2015	USA	2015
MN812950	USA/GA15-022479/2015	USA	2015
MN812951	USA/ND15-029655/2015	USA	2015
MN812952	USA/GA15-187/2015	USA	2015
MN812953	USA/MI15-21/2015	USA	2015
MN812954	USA/IL15-229/2015	USA	2015
MN812955	USA/CA15-52/2015	USA	2015
MN812956	USA/IA15-64/2015	USA	2015
MN812957	USA/MI15-7/2015	USA	2015
MN812958	USA/MI16-038766/2016	USA	2016
MN812959	USA/MI17-011956/2017	USA	2017
MN812960	USA/WI17-014775/2017	USA	2017
MN017170	GDHY/2018	China	2018
MN423333	CH-GDZQ-2018	China	2018

🚀 Complete Download Strategy

Since the PDF has 249 accession numbers (too many to manually type), here is the most efficient approach:

Method 1: Use NCBI Batch Entrez (Easiest)

Copy all accession numbers from the PDF into a text file
Go to: https://www.ncbi.nlm.nih.gov/sets/batch
Paste the accession numbers
Click "Retrieve" → Download FASTA

Method 2: Create a text file and use command line

# Create a file with all accessions (one per line)
cat > accessions.txt << 'EOF'
DQ641257
KC667560
KR063107
KR063108
KR063109
KT757280
KT757281
KT757282
KT321458
KU359210
KU359211
KU359212
KU359213
KU359214
KU058182
KU058183
KT827251
KU051391
KU051392
KU051393
KU051394
KX019804
KX377924
KX223836
KX778101
KX857728
KY172968
KU954087
KU954088
KU954089
KU954090
KX751943
KX751944
KX751945
KX759146
KY038016
KY486156
KY486157
KY486158
KY486159
KY486160
MN233024
MN233025
MN233026
MN233027
MN233028
MN233029
MN233030
MN233031
MN233032
MN233033
MN233034
MK333629
MK333630
MK333631
MK333632
MK333633
MK333634
MK333635
MK333636
MK333637
MN433300
MN164664
MN812938
MN812943
MN812944
MN812945
MN812946
MN812947
MN812948
MN812949
MN812950
MN812951
MN812952
MN812953
MN812954
MN812955
MN812956
MN812957
MN812958
MN812959
MN812960
MN017170
MN423333
EOF

# Download all sequences
efetch -db nucleotide -input accessions.txt -format fasta > all_249_sva.fasta

🧬 Next Steps After Download

Once you have downloaded all 249 sequences, here is the complete analysis pipeline:

Step 1: Remove identical sequences (11 removed as per paper)

# Use seqkit to remove duplicates
seqkit rmdup -s all_249_sva.fasta -o all_238_unique.fasta

Step 2: Remove recombinant sequences (5 strains from Table S2)

# Remove recombinants using seqkit grep
seqkit grep -v -p "MH316113|MK357117|MN887249|MK357116|MG765559" \
  all_238_unique.fasta > all_233_final.fasta

Step 3: Alignment with MACSE (as paper)

# Download MACSE
wget https://github.com/ranwez/MACSE_V2_PIPELINES/releases/download/V2.06/MACSE_v2.06.jar

# Run alignment
java -jar MACSE_v2.06.jar -prog alignSequences \
  -seq all_233_final.fasta \
  -out_seq aligned_sva.fasta

Step 4: Phylogenetic tree (Figure 1)

# IQ-TREE for ML tree
iqtree -s aligned_sva.fasta -m GTR+F+I+G4 -bb 1000 -nt AUTO

# MrBayes for BI tree (as in paper)
mb
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000
sump
sumt

Step 5: Selection analysis (Table 1)

# HyPhy analysis
hyphy busted --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile
hyphy fel --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile
hyphy fubar --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile

⚠️ What You Need To Do Now

Priority	Action
1	Copy ALL accession numbers from the PDF into a text file
2	Use NCBI Batch Entrez to download
3	Run the pipeline above

iTOL: tname

NGPhylogeny.fr

Go to Link

The Pipettes Solution

Step for running the analysis- Taking an example paper from Pubmed

📊 Paper Summary

📊 Paper Summary

🗂️ Our Reproduction Plan (Section by Section)

🔴 STEP 1: The Data — We Need Accession Numbers

📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

🧬 STEP 3: Alignment (as per paper)

🌳 STEP 4: Phylogenetic Tree (Figure 1)

🔬 STEP 5: Selection Analysis (Table 1)

⏱️ STEP 6: Evolutionary Dynamics (BEAST)

🌍 STEP 7: Phylogeography (Brazil as Source)

📋 What You Need To Do Right Now

✅ What I Will Give You Once You Paste Accessions

🗂️ Our Reproduction Plan (Section by Section)

🔴 STEP 1: The Data — We Need Accession Numbers

📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

🧬 STEP 3: Alignment (as per paper)

🌳 STEP 4: Phylogenetic Tree (Figure 1)

🔬 STEP 5: Selection Analysis (Table 1)

⏱️ STEP 6: Evolutionary Dynamics (BEAST)

🌍 STEP 7: Phylogeography (Brazil as Source)

📋 Extracted Accession Numbers (Page 1 of Table S1)

📥 Page 5 Accessions (I see more)

🚀 Complete Download Strategy

Method 1: Use NCBI Batch Entrez (Easiest)

Method 2: Create a text file and use command line

🧬 Next Steps After Download

Step 1: Remove identical sequences (11 removed as per paper)

Step 2: Remove recombinant sequences (5 strains from Table S2)

Step 3: Alignment with MACSE (as paper)

Step 4: Phylogenetic tree (Figure 1)

Step 5: Selection analysis (Table 1)

⚠️ What You Need To Do Now

Post a Comment

THE PIPETTES SOLUTION