[README.md] Add more guide lines for regular expressions

author Sergey M <dstftw@gmail.com>

Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)

committer GitHub <noreply@github.com>

Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)
author Sergey M <dstftw@gmail.com>
Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)
committer GitHub <noreply@github.com>
Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)
diff --git a/README.md b/README.md

index b3c39bf66ccf9f64785928919f8598f108b49bc6..bdc5faeec7bc68e91f7caf45527f6e82226ea89c 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1133,11 +1133,33 @@ title = meta.get('title') or self._og_search_title(webpage)
  
  This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
  
-### Make regular expressions flexible
+### Regular expressions
  
-When using regular expressions try to write them fuzzy and flexible.
+#### Don't capture groups you don't use
+
+Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
+
+##### Example
+
+Don't capture id attribute name here since you can't use it for anything anyway.
+
+Correct:
+
+```python
+r'(?:id|ID)=(?P<id>\d+)'
+```
+
+Incorrect:
+```python
+r'(id|ID)=(?P<id>\d+)'
+```
+
+
+#### Make regular expressions relaxed and flexible
+
+When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
   
-#### Example
+##### Example
  
  Say you need to extract `title` from the following HTML code:
author	Sergey M <dstftw@gmail.com>
	Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)
committer	GitHub <noreply@github.com>
	Tue, 1 Jan 2019 16:13:39 +0000 (23:13 +0700)