D7net
Home
Console
Upload
information
Create File
Create Folder
About
Tools
:
/
opt
/
alt
/
python37
/
lib
/
python3.7
/
site-packages
/
charset_normalizer
/
__pycache__
/
Filename :
api.cpython-37.pyc
back
Copy
B �B�h�X � @ sR d dl mZ d dlZd dlmZ d dlmZ ddlmZm Z m Z mZ ddlm Z mZmZmZ ddlmZ dd lmZmZ dd lmZmZmZmZmZmZmZ e�d�Ze� � Z!e!�"e�#d�� d(dddddddddddd�dd�Z$d)dddddddddddd�dd�Z%d*d ddddddddddd!�d"d#�Z&d+d$ddddddddddd%�d&d'�Z'dS ),� )�annotationsN)�PathLike)�BinaryIO� )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)� mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding�cut_sequence_chunks� iana_name�identify_sig_or_bom� is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bomZcharset_normalizerz)%(asctime)s | %(levelname)s | %(message)s� � 皙�����?TF皙�����?zbytes | bytearray�int�floatzlist[str] | None�boolr )� sequences�steps� chunk_size� threshold�cp_isolation�cp_exclusion�preemptive_behaviour�explain�language_threshold�enable_fallback�returnc 2 C s� t | ttf�s td�t| ����|r>tj} t�t � t� t� t| �}|dkr�t� d� |rvt�t � t� | prtj� tt| dddg d�g�S |dk r�t�td d �|�� dd� |D �}ng }|dk r�t�td d �|�� dd� |D �}ng }||| k�rt�td|||� d}|}|dk�r:|| |k �r:t|| �}t| �tk }t| �tk} |�rlt�td�|�� n| �r�t�td�|�� g }|�r�t| �nd}|dk �r�|�|� t�td|� t� }g }g }d}d}d}t� }t� }t| �\}}|dk �r|�|� t�tdt|�|� |�d� d|k�r4|�d� �xz|t D �]l}|�rZ||k�rZ�q@|�rn||k�rn�q@||k�r|�q@|�|� d}||k}|�o�t|�}|dk�r�|�s�t�td|� �q@|dk�r�|�s�t�td|� �q@yt|�}W n, t t!fk �r t�td|� �w@Y nX yr| �rh|dk�rht"|dk�rL| dtd�� n| t|�td�� |d� n&t"|dk�rx| n| t|�d� |d�}W nV t#t$fk �r� } z2t |t$��s�t�td|t"|�� |�|� �w@W dd}~X Y nX d} x |D ]}!t%||!��r�d} P �q�W | �r(t�td||!� �q@t&|�s4dnt|�|t|| ��}"|�od|dk �odt|�|k }#|#�rzt�td |� tt|"�d! �}$t'|$d"�}$d}%d}&g }'g }(y�x�t(| ||"||||||� D ]x})|'�|)� |(�t)|)||dk�o�dt|� k�o�d"kn �� |(d# |k�r|%d7 }%|%|$k�s4|�r�|dk�r�P �q�W W n@ t#k �r~ } z t�td$|t"|�� |$}%d}&W dd}~X Y nX |&�s�| �r�|�s�y| td%�d� j*|d&d'� W nF t#k �r� } z&t�td(|t"|�� |�|� �w@W dd}~X Y nX |(�rt+|(�t|(� nd}*|*|k�s&|%|$k�r�|�|� t�td)||%t,|*d* d+d,�� | �r@|dd|d-d.gk�r@|&�s@t| |||g ||d/�}+||k�r�|+}n|dk�r�|+}n|+}�q@t�td0|t,|*d* d+d,�� |�s�t-|�},nt.|�},|,�r�t�td1�|t"|,��� g }-|dk�rBx4|'D ],})t/|)||,�r,d2�|,�nd�}.|-�|.� �qW t0|-�}/|/�rdt�td3�|/|�� t| ||*||/| dk�s�||ddgk�r�|nd|d/�}0|�|0� ||ddgk�r|*d4k �r|*dk�r�t� d5|0j1� |�r�t�t � t� | � t|0g�S |�|0� t|��rn|dk�s ||k�rnd|k�rnd|k�rn|�2� }1t� d5|1j1� |�rdt�t � t� | � t|1g�S ||k�r@t� d6|� |�r�t�t � t� | � t|| g�S �q@W t|�dk� rb|�s�|�s�|�r�t�td7� |�r�t� d8|j1� |�|� nd|� r|dk� s2|� r(|� r(|j3|j3k� s2|dk � rHt� d9� |�|� n|� rbt� d:� |�|� |� r�t� d;|�2� j1t|�d � n t� d<� |� r�t�t � t� | � |S )=af Given a raw bytes sequence, return the best possibles charset usable to render str objects. If there is no results, it is a strong indicator that the source is binary/not text. By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence. And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will. The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page but never take it for granted. Can improve the performance. You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that purpose. This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32. By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain' toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging. Custom logging format and handler can be set manually. z3Expected object of type bytes or bytearray, got: {}r z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c S s g | ]}t |d ��qS )F)r )�.0�cp� r. �G/opt/alt/python37/lib/python3.7/site-packages/charset_normalizer/api.py� <listcomp>[ s zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c S s g | ]}t |d ��qS )F)r )r, r- r. r. r/ r0 f s z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii> �utf_16�utf_32z\Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.> �utf_7zREncoding %s won't be tested as-is because detection is unreliable without BOM/SIG.z2Encoding %s does not provide an IncrementalDecoderg ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.� � ���zaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %sg j�@�strict)�errorsz^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d � )�ndigitsr2 r3 )Zpreemptive_declarationz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}�,z We detected language {} using {}g�������?z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4� isinstance� bytearray�bytes� TypeError�format�type�logger�level� addHandler�explain_handler�setLevelr �len�debug� removeHandler�logging�WARNINGr r �log�joinr r r r �append�setr r �addr r �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr �range�maxr r �decode�sum�roundr r r r r5 �best�fingerprint)2r r r! r"