About NGS Library Cleanup and Size Selection

High-quality NGS libraries are key to keeping NGS costs down, minimizing the risk of losing important genetic information, minimizing wasted sequencer capacity and maximizing usable data. Effective library cleanup, size selection, quantification and validation are pivotal in generating high-quality NGS results and reliable data.

Library Cleanup and Size Selection

Next-generation sequencing (NGS) involves three basic steps:

          1) Targeted NGS library preparation ('library prep'), including cleanup, size selection and validation

          2) Sequencing

          3) Data analysis


Library preparation is crucial to the success of the NGS workflow. This step prepares DNA or RNA samples to be compatible with a sequencer, ensuring that The purpose of preparing a library is to make sure the target DNA is of the appropriate size and concentration, free of contaminants, with any required adapters properly ligated. 


Sequencing libraries are typically created by fragmenting DNA and adding specialized adapters to both ends. In the illumina®-based and Reverse Complement PCR (RC-PCR) powered NimaGen EasySeq™ and IDseek™ sequencing workflows, these adapters contain complementary sequences that allow the DNA fragments to bind to the sequencer flow cell. Fragments can then be amplified, followed by their reaction cleanup and size selection. A validation of input DNA remaining is recommended.


Library cleanup involves the targeted removal of small DNA fragments such as primers, adapters and dimers from a sample mixture for further downstream processing. When these fragments are present at high proportions in a sample, they can cause a variety of problems and interfere with proper sequencing on NGS instruments. They can impact the detection of low-input samples, take up significant amounts of the sequencing  capacity and cause a number of QC metrics to be triggered. Moreover, they can lead to to NGS alignment errors and increased unaligned reads which may negatively impact the data quality.


Various methods are available for library cleanup: magnetic beads, silica column based methods, or ethanol precipitation.


Size selection is the targeted capture of DNA fragments of a specific size or a size range. This relatively inexpensive part of the NGS library prep workflow can have a profound impact on experimental results and data quality. It refers to the elimination of suboptimal nucleic acid fragments from the library before applying to the sequencing platform. Part of this is making sure library fragment sizes are within the optimum range for a given instrument, typically 200-500 bp for illumina® systems.


Various approaches to size selection exist: enzymatic, agarose gel-based, and magnetic bead-based methods, the suitability of each depending on the needs of the experiment.


Library quantification and normalization is the process of respective assessment of library concentration and dilution of libraries of variable concentration to the same concentration.


Library validation is the final NGS library quality control step, prior to sequencing, essential for obtaining high-quality data outputs from a sequencing run. This step is aimed at verification of size, concentration and integrity of the library. 

Magnetic Cleanup and Size Selection

Magnetic beads enable cleanup reactions as well as a size selection to recover the desired fragment lengths needed for the specific sequencing application, and are the most commonly preferred approach in NGS library cleanup, applied both manually and automated. The magnetic bead-based approach is well suited for high-throughput applications and the cost of reagents is also low compared to other approaches. These properties make magnetic beads a straightforward solution for optimizing preparation of samples.


This type of sample preparation is utilizing inert beads that contain polystyrene cores and are covered in magnetite and a layer of carboxyl molecules. These beads enables cleanup reactions as well as a size selection to recover the desired fragment lengths needed for the specific sequencing application. In the presence of a binding buffer containing polyethylene glycol (PEG) and salt, nucleic acids bind to the beads reversibly. It is possible to control the sizes of the fragments that bind to the beads by adjusting the ratio of PEG, salt, and beads to nucleic acids.  In general, the magnetic beads supplied are already present in binding buffer and just have to be diluted to tune the required fragment size. 


Once nucleic acids have been allowed to bind the beads, a magnetic force is then applied, using either a magnetic rack for 1.5 mL Eppendorf tubes or a microplate compatible magnet (e.g. Alpaqua® Magnum FLX® or 96S Super Magnet). This allows the separation of the beads and thus the bound fragments from the remainder of the material. Desired fragments can then be collected either through elution from the beads triggered by a change in binding buffer (larger fragments) or direct collection in the supernatant, typically in the case of smaller fragments.


The gold standard in bead-based NGS library cleanup is AMPure XP. AmpliClean™ magnetic beads are the widely adopted alternative to AMPure XP beads for cleanup and size selection.


AMPure XP is a trademark of Beckman Coulter.

Gel-based Size Selection

Alternatively, size selection of the amplified library can be performed on an agarose gel. This is often recommended to check unpurified PCR product prior to a pooled library cleanup strategy.


Pippin Prep is an increasingly implemented preparative electrophoresis platform for collection of size-selected DNA samples. The system automates DNA size selection using disposable, pre-cast agarose cassettes to extract fractions according to user-defined software input. Samples are electro-eluted from agarose and collected in buffer. The platform is used for the construction of next-gen sequencing libraries. The system collects fragments (depending on gel type) between 90 bp- 8kb. A higher quality of sample can be extracted, more reproducibly and with no cross-contamination, when compared to manual gel purification methods.


Manual gel-based size selection involves the manual excision of DNA fragments in the size range of interest. This is more suited as a low throughput method. Although laborious, it is also a reliable way to perform size selection. 


Pippin Prep is a trademark of Sage Science, Inc.

NGS Library Quantification

Following magnetic bead cleanup, the library is ready for a quantitative and qualitative check, prior to transferring it to the sequencer. Library quantification is applied to determine the number of nucleic acid molecules present in a particular volume of your NGS library. This is an important step in the NGS workflow as it measures the concentration of sequencing-ready DNA present, which is essential for obtaining high-quality data outputs from each sequencing run.  


Accurate library concentrations are even more important if several libraries are pooled for sequencing in parallel. In this case, normalizing the library is essential to ensure even sequence yield or a balanced read distribution across all samples in the library pool.


Quantification can be carried out using electrophoresis, fluorometry, qPCR- or dPCR-based methods. A common method is determining the final concentration of the library by a double Qubit™ HS measurement. Qubit™ provides a fast and accurate fluorometric quantification of the DNA in the library.


Qubit is a trademark of Thermo Fisher Scientific.

Validation of Input DNA

Library validation is the final NGS library quality control step, prior to sequencing. During validation, library quality is checked to  confirm the expected average library size, that a library peak is present and to ensure the absence of additional small and large libraries. This step provides visibility into possible library issues, such as adapter dimers or unexpected library sizes.


A widely adopted method for library validation is visualizing the library size by running the samples on a Bioanalyzer or TapeSation system. The Agilent 2100 Bioanalyzer system, along with various kits, automates the analysis of size distribution and quantitation of fragmented input DNA with lab-on-a-chip technology.The 2100 Bioanalyzer with the HS DNA kit is ideal to analyze the fragmented input DNA size distribution, since it requires only single nanogram (ng) of DNA for analysis. The TapeStation system is a well-appreciated automated electrophoresis-based solution for library quality control of size distribution, concentration and integrity.


Bioanalyzer and TapeStation are trademarks of Agilent.