Notification texts go here Contact Us Buy Now!
Posts

Step for running the analysis- Taking an example paper from Pubmed

Please wait 0 seconds...
Scroll Down and click on Go to Link for destination
Congrats! Link is Generated

 


📊 Paper Summary

ItemDetails
TitleThe Evolution and Global Spatiotemporal Dynamics of Senecavirus A
JournalMicrobiology Spectrum (2022)
Key analysisRecombination, phylogeny, selection, spatiotemporal dynamics
Dataset249 SVA complete genomes from GenBank (Dec 2021)
Key findingBrazil as source of global transmission since 2015

📊 Paper Summary

ItemDetails
TitleThe Evolution and Global Spatiotemporal Dynamics of Senecavirus A
JournalMicrobiology Spectrum (2022)
Key analysisRecombination, phylogeny, selection, spatiotemporal dynamics
Dataset249 SVA complete genomes from GenBank (Dec 2021)
Key findingBrazil as source of global transmission since 2015

🗂️ Our Reproduction Plan (Section by Section)

Let's tackle this one analysis at a time:

SectionAnalysisStatus
1Data collection (accession numbers)🔴 Need supplement
2Recombination analysis (RDP4)🔴 Pending data
3Phylogenetic tree (BI tree, Fig 1)🔴 Pending data
4Selection analysis (Table 1)🟡 Can explain
5Evolutionary dynamics (BEAST)🟡 Can explain
6Phylogeography (Brazil as source)🟡 Can explain

🔴 STEP 1: The Data — We Need Accession Numbers

The paper clearly states:

"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."

Your task: Download the supplementary material

Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22

Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"

Once you have Table S1, paste the accession numbers here. I will then:

  1. Extract all 249 accession numbers

  2. Generate the complete download script

  3. Remove the 11 identical sequences (as they did)

  4. Remove the 5 recombinants (as they did) → final 233 sequences


📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

Once you paste the accessions, use this command:

bash
# Method 1: Batch Entrez (web)
# Go to: https://www.ncbi.nlm.nih.gov/sets/batch
# Paste accession numbers → Download FASTA

# Method 2: Command line (if you have Entrez Direct)
cat accessions.txt | while read acc; do
  esearch -db nucleotide -query $acc | efetch -format fasta
done > all_249_sva.fasta

🧬 STEP 3: Alignment (as per paper)

The paper used MACSE v2 (Multiple Alignment of Coding Sequences):

bash
# Install MACSE
git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git

# Run alignment
java -jar MACSE_v2.06.jar -prog alignSequences \
  -seq all_249_sva.fasta \
  -out_seq aligned_sva.fasta

Alternative (simpler): Use MAFFT

bash
mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta

🌳 STEP 4: Phylogenetic Tree (Figure 1)

The paper used:

  • Bayesian Inference (BI) with MrBayes

  • GTR+F+I+G4 substitution model

  • Posterior probabilities ≥0.99

Simpler reproduction with IQ-TREE:

bash
iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO

For Bayesian tree (BEAST/MrBayes):

bash
# MrBayes block
mrbayes
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000

🔬 STEP 5: Selection Analysis (Table 1)

The paper used HyPhy with multiple methods:

bash
# Install HyPhy
conda install -c bioconda hyphy

# Run BUSTED (episodic selection)
hyphy busted --alignment aligned_sva.fasta --tree tree.nwk

# Run FEL (fixed effects likelihood)
hyphy fel --alignment aligned_sva.fasta --tree tree.nwk

# Run FUBAR
hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk

# Run RELAX (relaxed selection)
hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII

They found 27 positive selection sites (Table 1):

  • RdRp: 7 sites

  • VP1: 5 sites

  • 3C: 5 sites

  • 2C: 3 sites

  • VP2: 2 sites

  • VP4, VP3, 2A, 2B: 1 site each


⏱️ STEP 6: Evolutionary Dynamics (BEAST)

The paper used:

  • Clock model: Uncorrelated relaxed log-normal (UCLN)

  • Demographic model: Bayesian SkyGrid

  • Substitution rate: 3.35 × 10⁻³ subs/site/year

  • TMRCA: 1986 (95% HPD: 1983-1988)

To reproduce:

bash
# BEAST XML file creation (requires BEAUti)
# Then run:
beast -beagle sva_analysis.xml

🌍 STEP 7: Phylogeography (Brazil as Source)

The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.

Key result: Brazil root state posterior probability = 0.4811


📋 What You Need To Do Right Now

PriorityAction
1Download Supplemental Table S1 from the journal website
2Paste the accession numbers here
3Tell me which analysis you want to reproduce first

✅ What I Will Give You Once You Paste Accessions

  • ✅ Complete list of 249 accession numbers

  • ✅ Download script (bash)

  • ✅ Full alignment command

  • ✅ Phylogenetic tree command (IQ-TREE + MrBayes)

  • ✅ Selection analysis commands (HyPhy)

  • ✅ Methods section for your assignment

  • ✅ Figure reproduction steps (Fig 1, 2, 3, 4, 5, 6, 7, 8)


Go download the supplementary material now. Paste Table S1 here, and we will start the complete reproduction.

spectrum.02090-22-s0001.pdf
PDF 1.61MB
for table s1, i have pdf.

🗂️ Our Reproduction Plan (Section by Section)

Let's tackle this one analysis at a time:

SectionAnalysisStatus
1Data collection (accession numbers)🔴 Need supplement
2Recombination analysis (RDP4)🔴 Pending data
3Phylogenetic tree (BI tree, Fig 1)🔴 Pending data
4Selection analysis (Table 1)🟡 Can explain
5Evolutionary dynamics (BEAST)🟡 Can explain
6Phylogeography (Brazil as source)🟡 Can explain

🔴 STEP 1: The Data — We Need Accession Numbers

The paper clearly states:

"A total of 249 SVA complete genomes were obtained from GenBank in December 2021 (see Table S1 in the supplemental material)."

Your task: Download the supplementary material

Go to: https://journals.asm.org/doi/10.1128/spectrum.02090-22

Look for: "Supplemental Material" → Download "spectrum.02090-22-s0001.pdf"

Once you have Table S1, paste the accession numbers here. I will then:

  1. Extract all 249 accession numbers

  2. Generate the complete download script

  3. Remove the 11 identical sequences (as they did)

  4. Remove the 5 recombinants (as they did) → final 233 sequences


📥 STEP 2: Download Pipeline (Ready When You Get Accessions)

Once you paste the accessions, use this command:

bash
# Method 1: Batch Entrez (web)
# Go to: https://www.ncbi.nlm.nih.gov/sets/batch
# Paste accession numbers → Download FASTA

# Method 2: Command line (if you have Entrez Direct)
cat accessions.txt | while read acc; do
  esearch -db nucleotide -query $acc | efetch -format fasta
done > all_249_sva.fasta

🧬 STEP 3: Alignment (as per paper)

The paper used MACSE v2 (Multiple Alignment of Coding Sequences):

bash
# Install MACSE
git clone https://github.com/ranwez/MACSE_V2_PIPELINES.git

# Run alignment
(base) suman@SumanPC:~/Senecapaper_2026_april$ java -jar macse_v2.07.jar -prog alignSequences -seq /home/suman/Senecapaper_2026_april/all_233_final.fasta java -jar MACSE_v2.06.jar -prog alignSequences \ -seq all_249_sva.fasta \ -out_seq aligned_sva.fasta


The file '/home/suman/Senecapaper_2026_april/all_233_final_NT.fasta' was created. The file '/home/suman/Senecapaper_2026_april/all_233_final_AA.fasta' was created.

Alternative (simpler): Use MAFFT

bash
(base) suman@SumanPC:~/Senecapaper_2026_april$ mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_svaMAFFTalternative.fasta
mafft --auto --maxiterate 1000 all_249_sva.fasta > aligned_sva.fasta

🌳 STEP 4: Phylogenetic Tree (Figure 1)

The paper used:

  • Bayesian Inference (BI) with MrBayes

  • GTR+F+I+G4 substitution model

  • Posterior probabilities ≥0.99

Simpler reproduction with IQ-TREE:

bash
(base) suman@SumanPC:~/Senecapaper_2026_april$ iqtree2 -s aligned_svaMAFFTalternative.fasta -m TEST -bb 1000 -nt AUTO
What iqtree -s aligned_sva.fasta -m TEST -bb 1000 -nt AUTO

What I got after analysis,

Analysis results written to:

  IQ-TREE report:                aligned_svaMAFFTalternative.fasta.iqtree

  Maximum-likelihood tree:       aligned_svaMAFFTalternative.fasta.treefile

  Likelihood distances:          aligned_svaMAFFTalternative.fasta.mldist

Ultrafast bootstrap approximation results written to:

  Split support values:          aligned_svaMAFFTalternative.fasta.splits.nex

  Consensus tree:                aligned_svaMAFFTalternative.fasta.contree

  Screen log file:               aligned_svaMAFFTalternative.fasta.log

For Bayesian tree (BEAST/MrBayes):

bash
# MrBayes block
mrbayes
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000

🔬 STEP 5: Selection Analysis (Table 1)

The paper used HyPhy with multiple methods:

bash
# Install HyPhy
conda install -c bioconda hyphy

# Run BUSTED (episodic selection)
hyphy busted --alignment aligned_sva.fasta --tree tree.nwk

# Run FEL (fixed effects likelihood)
hyphy fel --alignment aligned_sva.fasta --tree tree.nwk

# Run FUBAR
hyphy fubar --alignment aligned_sva.fasta --tree tree.nwk

# Run RELAX (relaxed selection)
hyphy relax --alignment aligned_sva.fasta --tree tree.nwk --branch-set cladeII

They found 27 positive selection sites (Table 1):

  • RdRp: 7 sites

  • VP1: 5 sites

  • 3C: 5 sites

  • 2C: 3 sites

  • VP2: 2 sites

  • VP4, VP3, 2A, 2B: 1 site each


⏱️ STEP 6: Evolutionary Dynamics (BEAST)

The paper used:

  • Clock model: Uncorrelated relaxed log-normal (UCLN)

  • Demographic model: Bayesian SkyGrid

  • Substitution rate: 3.35 × 10⁻³ subs/site/year

  • TMRCA: 1986 (95% HPD: 1983-1988)

To reproduce:

bash
# BEAST XML file creation (requires BEAUti)
# Then run:
beast -beagle sva_analysis.xml

🌍 STEP 7: Phylogeography (Brazil as Source)

The paper used MASCOT (Marginal Approximation of the Structured Coalescent) in BEAST 2.

Key result: Brazil root state posterior probability = 0.4811


📋 Extracted Accession Numbers (Page 1 of Table S1)

Here are the accession numbers I can clearly read from the PDF:

AccessionStrainCountryYear
DQ641257SVV-001USA2002
KC66756011-55910-3Canada2011
KR063107SVA/BRA/MG1/2015Brazil2015
KR063108SVA/BRA/MG2/2015Brazil2015
KR063109SVA/BRA/GO3/2015Brazil2015
KT757280USA/IA40380/2015USA2015
KT757281USA/SD41901/2015USA2015
KT757282USA/IA46008/2015USA2015
KT321458CH-01-2015China2015
KU359210MN15-84-4USA2015
KU359211MN15-84-8USA2015
KU359212MN15-84-21USA2015
KU359213MN15-84-22USA2015
KU359214MN15-308-M3USA2015
KU058182SVA-OH1USA2015
KU058183SVA-OH2USA2015
KT827251USA/GBI29/2015USA2015
KU051391US-15-39812IAUSA2015
KU051392US-15-40380IAUSA2015
KU051393US-15-40381IAUSA2015
KU051394US-15-41901SDUSA2015
KX019804KS15-01USA2015
KX377924HB-CH-2016China2016
KX223836USA/IN_Purdue_4885/2015USA2015
KX778101SD15-26USA2015
KX857728Colombia/2016Colombia2016
KY172968SVA-715USA2014
KU954087USA/IA39812/2015_P1USA2015
KU954088USA/IA40381/2015_P1USA2015
KU954089USA/IA44662/2015_P1USA2015
KU954090USA/IA44952/2015_P1USA2015
KX751943CH-DB-11-2015China2015
KX751944CH-DL-01-2016China2016
KX751945CH-LX-01-2016China2016
KX759146CH-ZW-01-2016China2016
KY038016CH/GXI09/2016China2016
KY486156SVA/Canada/MB/NCFAD-104-1/2015Canada2015
KY486157SVA/Canada/MB/NCFAD-104-6/2015Canada2015
KY486158SVA/Canada/MB/NCFAD-104-9/2015Canada2015
KY486159SVA/Canada/MB/NCFAD-108-12/2015Canada2015
KY486160SVA/Canada/MB/NCFAD-108-16/2015Canada2015

📥 Page 5 Accessions (I see more)

From page 5 of the PDF:

AccessionStrainCountryYear
MN233024USA/MO15-029085/2015USA2015
MN233025USA/KS15-031348/2015USA2015
MN233026USA/NC88-23626/1988USA1988
MN233027USA/MN88-36695/1988USA1988
MN233028USA/89-47552/1989USA1989
MN233029USA/NJ90-10324/1989USA1989
MN233030USA/IA90-23664/1990USA1990
MN233031USA/IL92-48963/1992USA1992
MN233032USA/IL94-9356/1993USA1993
MN233033USA/MN99-29256/1999USA1999
MN233034USA/LA97-98061/1997USA1997
MK333629SVA/USA/MN/004/2015USA2015
MK333630SVA/USA/MN/005/2015USA2015
MK333631SVA/USA/MN/006/2015USA2015
MK333632SVA/USA/MN/007/2015USA2015
MK333633SVA/USA/MN/009/2016USA2016
MK333634SVA/USA/MN/010/2016USA2016
MK333635SVA/USA/MN/011/2016USA2016
MK333636SVA/USA/MN/012/2016USA2016
MK333637SVA/USA/MN/013/2016USA2016
MN433300SDta/2018China2018
MN164664MN_US_2015USA2015
MN812938USA/IA89-47552/1989USA1989
MN812943USA/TN06-00310/2006USA2006
MN812944USA/MN07-00487/2007USA2007
MN812945USA/MN07-00488/2007USA2007
MN812946USA/IA09-34037/2009USA2009
MN812947USA/HI13-007758/2013USA2013
MN812948USA/HI13-019514/2013USA2013
MN812949USA/HI15-021004/2015USA2015
MN812950USA/GA15-022479/2015USA2015
MN812951USA/ND15-029655/2015USA2015
MN812952USA/GA15-187/2015USA2015
MN812953USA/MI15-21/2015USA2015
MN812954USA/IL15-229/2015USA2015
MN812955USA/CA15-52/2015USA2015
MN812956USA/IA15-64/2015USA2015
MN812957USA/MI15-7/2015USA2015
MN812958USA/MI16-038766/2016USA2016
MN812959USA/MI17-011956/2017USA2017
MN812960USA/WI17-014775/2017USA2017
MN017170GDHY/2018China2018
MN423333CH-GDZQ-2018China2018

🚀 Complete Download Strategy

Since the PDF has 249 accession numbers (too many to manually type), here is the most efficient approach:

Method 1: Use NCBI Batch Entrez (Easiest)

  1. Copy all accession numbers from the PDF into a text file

  2. Go to: https://www.ncbi.nlm.nih.gov/sets/batch

  3. Paste the accession numbers

  4. Click "Retrieve" → Download FASTA

Method 2: Create a text file and use command line

bash
# Create a file with all accessions (one per line)
cat > accessions.txt << 'EOF'
DQ641257
KC667560
KR063107
KR063108
KR063109
KT757280
KT757281
KT757282
KT321458
KU359210
KU359211
KU359212
KU359213
KU359214
KU058182
KU058183
KT827251
KU051391
KU051392
KU051393
KU051394
KX019804
KX377924
KX223836
KX778101
KX857728
KY172968
KU954087
KU954088
KU954089
KU954090
KX751943
KX751944
KX751945
KX759146
KY038016
KY486156
KY486157
KY486158
KY486159
KY486160
MN233024
MN233025
MN233026
MN233027
MN233028
MN233029
MN233030
MN233031
MN233032
MN233033
MN233034
MK333629
MK333630
MK333631
MK333632
MK333633
MK333634
MK333635
MK333636
MK333637
MN433300
MN164664
MN812938
MN812943
MN812944
MN812945
MN812946
MN812947
MN812948
MN812949
MN812950
MN812951
MN812952
MN812953
MN812954
MN812955
MN812956
MN812957
MN812958
MN812959
MN812960
MN017170
MN423333
EOF

# Download all sequences
efetch -db nucleotide -input accessions.txt -format fasta > all_249_sva.fasta

🧬 Next Steps After Download

Once you have downloaded all 249 sequences, here is the complete analysis pipeline:

Step 1: Remove identical sequences (11 removed as per paper)

bash
# Use seqkit to remove duplicates
seqkit rmdup -s all_249_sva.fasta -o all_238_unique.fasta

Step 2: Remove recombinant sequences (5 strains from Table S2)

bash
# Remove recombinants using seqkit grep
seqkit grep -v -p "MH316113|MK357117|MN887249|MK357116|MG765559" \
  all_238_unique.fasta > all_233_final.fasta

Step 3: Alignment with MACSE (as paper)

bash
# Download MACSE
wget https://github.com/ranwez/MACSE_V2_PIPELINES/releases/download/V2.06/MACSE_v2.06.jar

# Run alignment
java -jar MACSE_v2.06.jar -prog alignSequences \
  -seq all_233_final.fasta \
  -out_seq aligned_sva.fasta

Step 4: Phylogenetic tree (Figure 1)

bash
# IQ-TREE for ML tree
iqtree -s aligned_sva.fasta -m GTR+F+I+G4 -bb 1000 -nt AUTO

# MrBayes for BI tree (as in paper)
mb
execute aligned_sva.nex
lset nst=6 rates=invgamma
mcmc ngen=10000000 samplefreq=1000
sump
sumt

Step 5: Selection analysis (Table 1)

bash
# HyPhy analysis
hyphy busted --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile
hyphy fel --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile
hyphy fubar --alignment aligned_sva.fasta --tree aligned_sva.fasta.treefile

⚠️ What You Need To Do Now

PriorityAction
1Copy ALL accession numbers from the PDF into a text file
2Use NCBI Batch Entrez to download
3Run the pipeline above

iTOL: tname


NGPhylogeny.fr

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.