Patsnap: Supercharging FTO Search with Degenerate Sequence Searching

Patsnap

2023-07-10 18:24 2296

LONDON, July 10, 2023 /PRNewswire/ -- Patsnap's biological sequence database (Bio) statistics show that the occurrence of such special sequences in global patent literature is not insignificant. There are approximately 7.4 million nucleotide sequences, accounting for 7.12% of the total number of nucleotides, and 1.31 million protein sequences, accounting for 7.55%. This indicates a significant number of generic sequences that can affect search results due to the presence of special symbols, posing substantial risks for FTO analyses. 

Patsnap's Solution:

Therefore, to mitigate the risk of overlooking these critical sequences, Patsnap's Algorithm Engineering Team has developed a deep learning model using in-house NLP, CV, entity recognition, and coreference resolution technologies.

This model is designed to identify and parse degenerate sequences and their substitutions in sequence listings and full-text patents, and it established a Degenerate Sequence Searching Database as part of our Bio Professional package.

Using a specialized sequence alignment algorithm, this database not only enables the retrieval of such sequences but also provides a true similarity score. Therefore, by performing searches within the degenerate sequence database, we can effectively mitigate the risk of inadvertently overlooking crucial information during freedom to operate (FTO) and novelty searches.

Given the potential scale of variations in degenerate sequences, which can reach the tens of billions, traditional sequence alignment algorithms fail to meet the real-time retrieval demands. Patsnap tackles this challenge by employing a deeply customized sequence alignment algorithm that dynamically loads substitution information for degenerate sequences during the retrieval process, ensuring precise retrieval within reasonable time frames.

During the scanning phase, Patsnap introduces a compression algorithm to construct a seed word table for heuristic searches, significantly reducing unnecessary comparisons and improving retrieval efficiency. When aligning query sequences with target sequences, Patsnap's proprietary algorithm incorporates degenerate substitution information, resulting in more accurate alignment and query results, as well as more intuitive and visually appealing alignment outcomes for different variants of the query sequence and target sequence.

Biological sequences form the bedrock of innovation in biotechnology, with countless advancements revolving around these sequences. However, the unique nature of biological sequences poses a challenge for conventional keyword-based information retrieval methods, often leading to the oversight of crucial information and potential risks.

The sequences presented in patent claims encompass a wide range of variations, not only describing the sequences themselves but also requiring a specific level of homology. As a result, researchers heavily rely on homology sequence alignment algorithms to explore sequence databases, using predefined homology thresholds to ensure comprehensive results. This approach is widely employed in current biological sequence database searches.

Nevertheless, a pressing question remains: can these similar sequence searches genuinely identify all potential target sequences? While these methods have proven effective, their ability to capture every relevant sequence warrants further examination. It is crucial to explore the limitations of current search methodologies and strive for enhanced approaches that leave no potential target sequence undiscovered. 

Special Sequences in Patents 

Combining similar sequence searches with keyword based results aggregation significantly reduces the risk of overlooking crucial information and FTO issues.

However, sequences in patents differ from those found in other biological databases as they exhibit many "patent-specific" characteristics. To expand the scope of patent protection and create search barriers for competitors, patent drafters often employ a description method similar to the "Markush structure" used in chemistry. By introducing degenerate symbols, wildcards, operators, and other information between positions in the parent sequence, and describing the specific parameters of these symbols through explanatory documents, we refer to them as "Degenerate Sequences."

The image below illustrates a degenerate sequence described in patent claims: 

25. The library of any one of claims 1-24, wherein the polypeptide comprises an amino acid sequence according to Formula (III):

EVGSYX₁X₂X₃X₄X₅X₆CX₇X₈X₉X₁₀X₁₁X₁₂CX₁₃X₁4SGRSAGGGGTENLYFQGSGGS (SEQ ID NO: 3), wherein X1 is A,D,I,N,P,or Y,x2is A,F,N,S,or V,X3 is A,H,L,P,S,V,or Y,X4 is A,H,S,or Y,X5 is A,D,P,S,V,or Y,X6 is A,D,L,S,or Y, X7is D,P,or V, X8 is A,D,

H,P,S,orT,X9is A,D,F,H,P,or Y,X10 is L,P,or Y,Xl1is F,P,or Y,X12 is A,P,S, or Y, X13 is A,D,N,S,T, or Y,and X14 is A,S, or Y.

26. The library of claim 25, wherein each of the polynucleotides in the library encodes a polypeptide comprising an amino acid sequence according to Formula (III).

27. The library of any one of claims 1-26, wherein the polypeptide comprises an amino acid sequence selected from the group consisting of SEQ ID NOS: 25-46.

28. The library of any one of claims 1-27, wherein the TBM comprises an antibody light chain variable region.

29. The library of claim 28, wherein the polypeptide further comprises a heavy chain variable region C-terminal to the light chain variable region.

Degenerate sequences themselves do not possess any biological significance; they solely serve the purpose of the patent. However, when combined with the description of the homology range, such an approach not only comprehensively protects innovative achievements but also becomes a "decisive blow" against the current conventional sequence homology search methods.  Let's take a look at an example below.

Query sequence:

"EVGSYPAPSDACPSDYFYCDASGRSAGGGGTENLYFQGSGGS" 

Target sequence: 

"EVGSYXXXXXXCXXXXXXCXXSGRSAGGGG TENLYFQGSG GS" 

The similarity score obtained from the BLAST algorithm is only 67%, but the actual similarity is 100%. 

This happens because conventional sequence homology alignment algorithms do not consider scenarios involving degenerate sequences during their initial development. Therefore, without special processing, excluding degenerate sequences would lead to two situations when using conventional algorithms: 

1) Inability to search for the sequence.

2) Exclusion of sequences due to similarity scores falling below the threshold. 

Both scenarios pose significant challenges for sequence searchers, as they not only impede the comparison of sequences with patent claims but also increase the likelihood of overlooking critical sequence information. 

Experience Degenerate Sequence Searching Now

In June of 2023, Patsnap's biological sequence Bio database introduced a powerful degenerate sequence search feature, causing a paradigm shift in the patent domain. This disruptive advancement provides researchers with an immensely robust tool that offers an extensive collection of degenerate sequences, allowing users to effortlessly obtain the most accurate and relevant information in their searches.

To schedule a demo or learn more, visit patsnap.com/solutions/bio.

About Patsnap: Founded in 2007, Patsnap is the company behind the world's leading AI-powered innovation intelligence platform. Patsnap provides global businesses with a connected, easy-to-use platform that helps them make better decisions in the innovation process. Customers are innovators across multiple industry sectors, including agriculture and chemicals, consumer goods, food and beverage, life sciences, automotive, oil and gas, professional services, aviation and aerospace, and education.   

Source: Patsnap

Keywords: Biotechnology Computer/Electronics Internet Technology Data Analytics STEM (Science，Technology，Engineering，Mathematics)

Media Room more

Patsnap Surpasses US$100 Million in Annual Recurring Revenue, Reinforces Position as a Global Leader in IP and R&D Intelligence

2024-06-11 21:00

2004

Official streaming platform for the Games of the Future Abu Dhabi 2025 powered by ADNOC launches ahead of opening competition

2025-12-13 21:57

1792

Phygital athletes competing against each other on the digital side of phygital

Patsnap: Supercharging FTO Search with Degenerate Sequence Searching

Patsnap Surpasses US$100 Million in Annual Recurring Revenue, Reinforces Position as a Global Leader in IP and R&D Intelligence

PATSNAP MAKES STRONG FORAY INTO JAPANESE MARKET

Patsnap Releases the 2023 Global Innovation Report

PATSNAP TO SPOTLIGHT THE LATEST IP ANALYTICS SOLUTION AT THE AIPLA 2023 ANNUAL MEETING

Patsnap Fully Opens 140 Data Products to Meet Growing Data Service Demands

PatSnap Publishes 2023 Annual Report on the State of IP in a Tech-Driven World

AI Infrastructure Company EverMind Released EverMemOS, Responding to Profound Challenges in AI

CDNetworks Reinforces Enterprise Reliability: Highlighting Established Commitment to Service Continuity

Official streaming platform for the Games of the Future Abu Dhabi 2025 powered by ADNOC launches ahead of opening competition

SmallRig and FamilyLens Launch Global Family Filmmaking Initiative at 3rd FamilyLens International Film Festival

Introducing CASEKOO Rouge. The Gold Standard.

NETMARBLE REVEALS NEW TRAILER FOR THE SEVEN DEADLY SINS: ORIGIN AT THE GAME AWARDS 2025